Tuesday, August 30, 2011

Storing stuff both locally and online

Storing stuff locally has the advantage of instant access. You can browse thousands of photos quickly in a way you can't do when they're stored online (though Deep Zoom or SeaDragon techniques can help, as shown in my example). You can navigate or search a family tree instantly when it's stored locally in a way you can't do when the data is stored online.

On the other hand, when something is stored online, it can be accessed from other computers and, more importantly, can be shared with others. It is also likely backed up better than your hard drive typically is.

But trying to store a collection of resources (such as a photo collection, or family tree data) both locally and online often results in a nightmare of trying to keep them synchronized, as you edit, tag, reorganize and even delete things.

One solution is to take an approach similar to Google Docs, where a collection is always modified through a series of deltas. Whether a collection is being modified locally or online, deltas get sent (or stored for later retrieval) so that the different copies are eventually consistent. A mechanism allows for resolution of occasional conflicts.

As an example, let's say I wanted to organize and archive a collection of photos on my hard drive, but wanted to make it available online so that others could help tag faces, and so that they could enjoy seeing them as well.

I could scan the photos into folders. Then I could launch a local "photo archive" app. As I add photos to my archive, it assigns each photo a globally unique permanent ID. It also adds tags to the photo's XMP metadata indicating what its original physical arrangement was (based on folder structure). I could also add information to each folder indicating what physical container it represents.

I could then have my desktop app push the photos and metadata up to an online repository. From then on, any changes I make using a web interface get queued up on the server, and the next time my desktop app connects, it applies the same changes locally. A change log is part of the database, so that changes can be viewed or rolled back. Deletions flag the photos as deleted without actually purging them, so that this, too, can be undone. An actual purge can also be done, in order to reclaim hard drive space, but a user would be prompted before allowing this on their local hard drive.

I could then invite family members to go view the archive of photos online and help tag faces, or estimate when and where the photos were taken.

I could also take the "default primary logical arrangement", which mirrors the hard drive structure, and rearrange photos, reordering within a folder; creating new folders and rearranging those; moving photos from one to the other, etc. As logical arrangements are made, metadata is embedded in the XMP metadata again, so that the database can be reconstructed from the raw files if needed. Names of folders and photos could physically contain numeric prefixes to get them to sort properly in a typical OS; but a UI could hide that (or at least automatically update it) as resources are moved around.

If I modify photos outside of the organizer, the organizer can re-scan the metadata of the photos to figure out what changes have happened. If I copy the whole folder of images (or a subset thereof) to a new computer, I could run the same app (or some future derivative), and it could rebuild the database from the metadata within the images. If I lose interest or kick the bucket, my family members still have access to my photos online, and can grab copies of the ones they're interested in.

And, of course, it would be necessary to be able to restrict public access to certain photos for the sake of privacy.

The same approach could be taken for a family tree. A local family tree could be imported from a GEDCOM file and added to a local database. An on-line database could be created with a copy of the local one. Any changes made to either one would add to the change log, and sent to other when the local database can connect to the Internet. That way, lightning-fast display and editing can happen locally, but global (though privacy-controlled) access and backup can happen online, and synchronization is almost completely automatic, except in the rare case where two desktop apps edit the same person between synchronization, at which point they system could do its best to arbitrate, and let the user override defaults if they want to.




No comments: