Sunday, February 20, 2011

Archiving that box of stuff for posterity

Everyone has that box of stuff--old family photos, certificates, a family bible with genealogical data in the front, an ancestor's journal, and so on. It may be spread throughout the house and attic rather than gathered in one box, but most people have some of this stuff. If properly preserved, it can be a priceless treasure trove to current and future generations. If mismanaged, it can get destroyed, thrown out, or get so disorganized that it loses much of its value.

For example, a shoe box full of photos without labels on them can, within one generation, move from becoming precious family history to becoming a worthless bunch of photos that nobody can identify. Labeling photos on the back is a great first step, but I'm hoping we can all figure out a reasonable way to digitally archive photos, documents and other things for posterity.

There are a few principles that should be kept in mind as we figure out how to do this.
  • Arrangement
    • Physical arrangement. The physical arrangement is the physical grouping and ordering of items within a group, where an "item" can be a single item such as a photo, or a sub-group (such as a box of slides) within a larger group (such as a cardboard box or an attic). Physical arrangement is important because it provides important context that can help us make sense of resources. If we know which photos are part of one box of slides, for example, and the other photos in the box are all of one side of the family, then a face we are having trouble identifying can more reliably be placed on that side of the family. Or if one box is chronologically out of order, the entire box of photos can be moved before another one instead of having to figure that out for each photo, which might otherwise be impossible.
    • Logical arrangement(s). A "logical arrangement" seeks to organize resources according to some useful scheme, such as chronologically; or in groups like "family" and "travel". Even when arranged chronologically, it may be arranged by "event" such as "trip to Hawaii", or by strict year, month and day. It is even possible to have multiple logical arrangements, though it might be helpful for one of them to be the "primary" one, especially if the files are physically stored according to one of them. (Non-primary logical arrangements would be free to include only a subset of the resources, while the primary one would include them all once and only once).
    • Digital arrangement. By "digital arrangement", I mean the folder structure on a hard drive. We could choose to have the digital arrangement mirror the physical arrangement. Often this requires prepending a zero-padded number to the beginning of a folder or file name in order to get the files or folders to sort properly in typical operating systems. We could, however, choose to have the digital arrangement mirror a "logical arrangement" (i.e., the "primary" one).
  • Digital preservation. In addition to initially digitizing photos, audio, movies, documents and other resources, it is important that the resources be protected against being lost or corrupted.
    • Backup. Hard drives fail. DVDs and CDs degrade. It is important that data be stored in more than one place, and organized well enough that we know when one resource is a duplicate or backup of another one. Ideally, things would be backed up online in more than one place.
    • Format shift. Still have a 5.25" floppy drive? Me, neither. Media formats change, so digital data needs to be migrated from one format to another as that happens. File formats go obsolete, too, so data needs to be migrated from WordPerfect to MS Word .docx; or from JPEG to whatever the next thing is. We usually don't think of doing this very often, so ideally, an online preservation service would do this automatically for all of its resources.
    • Apathy. Your grandfather passes away, and you only have 2 days off of work to go through all of his stuff. You don't have time to make sense of all those files on his ancient PC, so you wipe the hard drive and drop it off at a local charity for resale. So much for his lifelong efforts to digitize, tag and preserve precious family photos. A lady I know went to her grandfather's house after he passed away, and before she arrived, her sister had thrown out the journals that he had kept for his whole life. You never know how ignorant people are going to be when it comes to precious resources like these, so it needs to be kept somewhere that posterity can still access and use it in spite of who get entrusted with the original resources temporarily.
  • Sharing. Resources can and should be shared with others, but often only a piece or subset of the collection is shared. Those who end up with one photo from a collection should have a way of reconnecting with the original collection. Again, this could be addressed by an online archive with long-lived URLs for resources and collections of resources (and collections of collections, and so on). A photo could then point to one or more online copies of where information about its collection structure can be found.
I was intrigued by the Saturday morning keynote talk given at RootsTech 2011 about the Internet Archive. Their goal is to archive everything forever for free, as far as I could tell. As they are probably well aware, there is a big difference between just backing stuff up and "archiving" it, just as there is a big difference between a photo album and a shoe box. Knowing what you have, how it is arranged, and therefore what it "means" is almost as important as keeping it stored at all. The Internet Archive is one organization that might be able to store people's "box of stuff". I can imagine FamilySearch or other organizations offering "Preservation as a service", too. Ideally, a single user's archive would be stored in more than one of these, in case one organization goes under, has a disaster, or has a shift in priorities that puts their archives in danger. Privacy is another tricky issue with archives. On the one hand, we want to preserve photos for posterity, which means that we want our posterity (as well as current living relatives) to have access to it. On the other hand, privacy laws in many countries (especially Europe) make it illegal for one person to reveal information about another living person without their express permission. One option is to allow users to flag resources as public or not; and allow other users to flag resources as non-public if they don't want the pictures or information out there (i.e., "opt out"). And it's possible to have a timeout on resources (e.g., 110 years later, any living people mentioned in the resource can assumed to be dead). Or access could be restricted from the countries that have the stricter laws? Not sure. Comment if you have any good ideas on how to approach this part.

Assuming that the privacy part can be figured out, though, we still need a way to archive things in a way that has a good chance of preserving resources and their context long-term.

No comments: