A bit ago I ran across an article at the Popular Mechanics site that discussed what it called the Digital Ice Age. In a nutshell, the thrust of the article is that the potential for information stored in a digital archive to become inaccessible in the future is rather great if actions are not taken soon.

As the article discusses, the problem is that information stored was created in a particular environment. In order to resurrect the information, we need access to a similar environment. There are two main techniques suggested for preserving data: emulation and migration.

The first is near impossible as emulating every conceivable program that creates data would be incredibly overwhelming to manage. For example, Lockheed Martin, who is working to develop a solution for the National Archives, has identified over 4500 file types that need to be addressed. The Migration involves the translation of data an older format to a newer one for access by newer methods. The article quoted Jeff Rothenberg (Rand Corp.) as having the following to say about migration:
It seems to me that migration throws away the original. It doesn’t even try to save the original. What you end up with is somebody’s idea about what was important about the original.

The point here is that, in traditional history, we have the original, preserved documents. For ancient ones there is a great deal of interpretation that needs to take place – both language, completeness and context barriers. Though we could store the original documents in addition to the migrated versions, this would be redundant and not really gain anything.

That is a significant statement. If you have ever worked with a tool, even one that is a version upgrade for a specific tool, knows that there can be data loss. This is an example of one company deciding how to manage the data construct it itself created and (in theory) completely understands. Mass migration, on the other hand, would be prone to many errors. First, there is the assumption that the team working on the migration system completely understands each of the 4500 formats and can provide a structurally well-defined transformation.

The next issue with migration is in applying the transformation. The transformation is defined based on structure, perhaps with some influence by content. Testing the transformation will always be done on a subset and confidence is gained via statistics. That is, there is always the chance for errors. Errors will due to the inability to define a complete set of test cases as well as the fact that inspection of the actual transformation can only be done on sample. No one will be able to inspect every file that is migrated.

Rothenberg is also credited with suggesting emulation as a better technique. However, this technique still results in a loss. As anyone who has tried running emulators knows there is always something lost in the translation. Building on the article’s gaming analogy, there is an anthology for the PlayStation that provides 85 classic Atari games. The games run in an emulator (developed by Digital Eclipse) in turn running on the PlayStation. While the games execute – and perhaps faithfully – there is something lost in the translation. I can remember playing a number of those games on the original Atari 2600. My enthusiasm was considerably less when playing the emulation. Video games have evolved as well as my expectations. The original context has been lost. And for the record, I don’t recall Yar’s Revenge being that hard.

Though the article is addressing mass information archival, it is also a significant personal issue. There is much data we collect over time that could be easily dismissed (tax records, notes in text files, random emails from people asking about dinner plans, etc.) We all have digital data that we would probably like to retain for our lives and pass to our children. Emails from friends and relatives that provide insight to someone’s inner thoughts and thought processes, pictures, videos and who knows what else.

In addition to the concern about changes in format, there is the personal storage issues. As an example, I moved from a 1.3 megapixel camera to a 6 megapixel camera around 9 months ago. The images from the new camera take up on the order of six times the space. Considering my storage card capacity increased as well (60 or pictures with the original set-up as opposed to 160 on the new), I take far more pictures before having to dump them to my hard drive. I was even able to fit an entire year of images on a single CD. That is no longer the case.

Leaving my images on my hard drive gives rise to a single point of failure. Of course, without using proper media, the lifetime of the storage is also limited. (Note: KODAK did it’s own study and anecdotal evidence and experience would give a raised eyebrow to the latter. The point it is, it’s not a forever kind of thing.)
Working with online galleries (or pure storage for that matter) is one way to solve the problem – assuming we’re confident the provider is doing their due diligence. But I haven’t come to grips with putting my entire private side of life in the public domain (“public” in the sense that I don’t know who really has access to it). I do have a blog or two to share pictures with friends and family but it’s just a small tip of the iceberg.

And what about my children? I remember sitting with my parents or grandparents looking through photos of relatives, learning who was who and what they did. The digital age has changed that. For one, I have more pictures since I am not limited by the roll size or having to carry the rolls with me. And unless I print them, there’s nothing to flip through. That’s a depressing thought. I think there is something to be said for the shared, intimate experience between two people as they quietly flip through a physical book, away from the hum of cooling fans and glow of the display.

Of course, I could make an album. But I don’t. How do you decide which of the thousand or more pictures make the cut? Where do I find the time to parse through them all? Do I create more storage to organize the albums?

