Thomas Hodgson is a user on You can follow them or interact with them if you have an account anywhere in the fediverse. If you don't, you can sign up here.
Thomas Hodgson @twsh

What can I read to learn about best practice for storing data in a digital humanities project? In particular, a database of historical records.

· Web · 10 · 3

@twsh Hi! I'm a data management librarian. It's not specific to humanities, but there's a good rule for storing *any* research data called the 3-2-1 rule.

Basically, keep *3* copies of any important file (primary + 2 backups). Your backups should be on *2* different storage media (e.g. 1 cloud & 1 external hard drive). Lastly, *1* of your backups should be in a different place from you geographically (typically cloud but check)


could you be a bit more specific? are you asking about backup best practices, or designing a project so that the data is future proof and accessible?

I know some things about the latter, but most of what I've learned I've picked up from people in the field.

@omniadisce I was thinking more about best practice than backups. Not that backups aren't good.

@twsh @omniadisce I can think of worst practices for data sets that I've seen in the medical literature :P


if what you're asking about is the way to take a set of historical records, edit them, and publish them in a robust and futureproof digital format, I think most people would agree that TEI-XML is the currently the standard for scholarly textual markup, which can then be served to users in a variety of ways

detailed examples here:

XML presupposes you're OK with data in a hierarchical data model. if you're working with literary texts this can be problematic

@omniadisce Thanks! That's going on my reading list.


do let me know if you want specific advice about tools, programs, or further reading

as it happens, I'm in the early stages of planning a TEI project for some historical materials myself

@omniadisce I might well do that when I have a clearer idea of what it is I'm doing.


all of the replacements for TEI are pretty bleeding edge and/or experimental at this point

but for historians, the issues with the XML data model may be unimportant (and in fact, many many lit scholars are fine with the tradeoff)