I'm going to try to pick up an idea that I had on here before about using my Mastodon account to replace my (very!) old Advogato account. So here comes the first of many development posts about .

So first, for those of you who don't know me, I'm the head developer on the project formerly known as RLetters and evoText, now known as Sciveyor. It's a system for performing textual analysis on journal articles.



Now, we've had real data integration problems in the project over the years. The "central data store" was XML files on my NAS. That's cool and all for long-term archiving (and it works for feeding stuff into ), but it *sucks* for long-term project maintenance.

The fix: baptize our instance as the "canonical" data store. I've got a JSON schema for our data (data.sciveyor.com/schema/ + codeberg.org/sciveyor/json-sch) and a tool to verify Mongo against it (codeberg.org/sciveyor/schema-t)

So now we have a central source of data that we can check for validity (very quickly in ). Today's goal was a super-quick-and-dirty way to mirror the contents of the MongoDB server over to Solr (codeberg.org/sciveyor/mongo-so, more Go). It only checks version numbers and presence/absence, but it'll be good enough to keep the Solr server current.

And hey: in the intervening years since I last redid our schema, Solr now has decent nested-document support! Woohoo!


In any event I think it'll be helpful for me to keep thinking through this stuff somewhere, and here is as good as most places! Talking to myself about what I've done and what I'm going to do usually makes me a better developer.

