Tuesday, December 07, 2010

Tagging people in photos, for posterity

I have long wished that there was a standard format for identifying people in digital photos. Facebook has brought face-tagging to the masses, and I believe that they have a pretty good model: You can tag a region of the image with either free-form text, or you can pick from a list. The beauty of the latter option is that not only do you attach a name to a face, but you attach an identity.

Once the computer knows not just the name that goes with that face, but the identity, it can do things like notify that user that they've been tagged in an image, and so on.

The thing that is missing, however, is that the tag exists only at Facebook. I have a collection of old photos, and in order to identify the individuals, I have had to resort to text files next to the photos in the computer; or cramming info into a file name; or adding captions to the photo metadata. But all of these leave some room for ambiguity, and none of them go beyond providing a simple name. And no software is going to know how to identify these people.

What we need is a standard metadata tagging scheme that can be used to identify a region in an image, and attach a name, and, optionally, URLs or typed identifiers to identify who this person is in various other systems. For example, I could tag my great-grandfather as "James Kay Polk Gray", and then attach a "new FamilySearch" person ID; and maybe a URL to an entry on "biographicalwiki.org"; and maybe another URL to my personal online family tree. By using multiple links and IDs, it is more likely that at least one of them will survive until someone wants to look the ancestor up.

I was hoping that Adobe's XMP format would provide a place for this kind of metadata, but someone from Adobe said that while this was an interesting idea, they hadn't intended XMP to include sub-image metadata. The president of IPTC (the organization that defines the metadata tags used in many digital photos) said that something like this was on their roadmap, but I don't know how that has progressed.

FaceBook, iPhoto, Picasa and Photoshop Elements all allow you to tag faces in photos. However, it takes a while to do this, and if the tags aren't portable, then I'm not ready to spend the hours necessary to do it. I'm hopeful that the industry can come up with a standard for doing this sort of thing.

The same standard might even work for tagging words in a text image (e.g., for OCR); or for tagging words in a handwritten image (like in a genealogical document).

Wednesday, March 08, 2006

Source-centric genealogy overview

Welcome to the source-centric genealogy blog.

Source-centric genealogy is an approach to doing genealogy in which information is viewed as flowing from sources to evidence and finally to conclusions. This is done in such a way that conclusions can be traced back to the evidence they are based on, and the evidence knows what part of which source it came from. Another important aspect of source-centric genealogy is that it makes it possible to take a source and see what evidence has been extracted from it, and to see what conclusions have been drawn from a particular piece of evidence. This makes it possible to avoid unending duplication of work.

The essential elements of a source-centric genealogical system include:
1. A source authority, which tracks all known sources of genealogical data.
2. An artifact archive, which holds images of records and other digital artifacts for convenience in accessing original records.
3. A structured data archive, which holds structured genealogical data that has been extracted from individual sources. Its purpose is to accurately represent what a source says.
4. A family tree, which holds conclusions about what real people have lived and how they are related. Each person in the family tree has links to the various entries in the structured data archive that are believed to refer to the same real person.

It is also important that verification work be tracked so that it, too, can be done "once" and "for all" instead of having to be repeated by everyone.

There were several papers on this topic by Randy Wilson at the 2002, 2003 and 2006 Family History Technology workshops presented at BYU. Below are links to each paper: