Data Archiving
Data continues to grow at an astounding rate as referenced in one of my earlier post. This growth is a result of increasing data creation , and a tendency to keep everything for fear of losing something that may someday be needed. Confidently and cost-effectively managing data throughout its complete lifecycle, so that even data that is rarely accessed can be retrieved when required, is a very big challenge.
Archiving in the early days of data storage, typically consisted of backing up data, either to another, similar device, or to tape. As storage grew, Hierarchical Storage Management (HSM) systems were developed to help facilitate file back up and access to make this process as painless as possible. Technology selection for this process was based on cost and the required speed of retrieval to access requirements.
As high-availability, fast-access storage costs continue to decline (at a rate remarkably similar to rate at which data storage demand has been growing), large amounts of data continue to remain on spinning disks in data centers. Even though storage costs are relatively low, this practice leads to a significant datacenter footprint, and ongoing support costs, including costs to upgrade datacenters and migrate data to new systems over time. This practice of keeping everything has resulted in more and more data being abandoned-in-place, typically on spinning disk and rarely, if ever accessed or changed. This type of data is sometimes referred to as “cold” data.
IBM published a blog on their Big Data and Analytics Hub introducing a multi-temperature data management solution that refers, in part to having data that is frequently accessed on fast storage—hot data—compared to less-frequently accessed data stored on slightly slower storage—warm data—and rarely accessed data stored on the slowest storage an organization has—cold data (http://tinyurl.com/z6j26ag).
In 2012, the National Science Foundation estimated that over 60% of data was considered cold. It is unlikely that the practice of keeping data forever will not end, so we need to determine the best way to keep really large amounts of data forever.
Images have been a mechanism for recording history for thousands of years using tools such as Egyptian hieroglyphic, and demotic characters. With the advent and commercialization of photography in the early 19th century, more and more history was routinely recorded. If you happen to have scrap books or photo albums from your parents or grandparents, you more than likely appreciate the value of this simple data recording system.
Hieroglyphics and demonic characters can still be seen today – reference the Rosetta Stone, discovered 1798 at Rosetta, Egypt, and dating back to the second century BCE. This discovery opened the way to studying early Egyptian records. Even today, in the world of digital photography, traditional film-based photography remains the image capture mechanism of choice for the National Park Service to ensure longevity and authenticity of the record captured by these images. In a recent NPR interview, describing their search for the next Ansel Adams, the Park Services stated -- ”The negative, when properly stored, as ours are at the Library of Congress, has the longest lifespan that we can imagine. They estimate 500 years. And then, finally, it's more difficult to fool around with a large-format photograph and make it look like it's something that it isn't.” Reference -- National Park Service Launches Search For Next Ansel Adams: (http://tinyurl.com/h89zufk)
Fortunately, there are an image-based storage technologies being developed. One example is DOTS (Digital Optical Technology System), a storage media that is non-magnetic, chemically inert, immune from electromagnetic fields including electromagnetic pulses, and is able to be stored in normal office environments or extremes ranging from 15º to 150º F without compromising the image. DOTS is a true visual “eye-readable” method of storing digital files. It is essentially a picture that can optically decoded. With sufficient magnification, one can actually see the digital information. Because the information is visible, as long as cameras and imaging devices are available, the information will always be recoverable. More on DOTS (http://tinyurl.com/jutlqoo)
So maybe this early concept of capturing records with images is something we should seriously consider, or at least, consider. Images on a stable media, when properly captured and preserved exhibit a long life without frequent and costly technology refreshes like those we experience with traditional digital storage. The total cost of preserving image-based records is likely to be much lower than storing the equivalent digital content on today’s digital storage technologies.
Taking a lesson or two from history on how civilization has successfully archived information over thousands of years may help us innovate solutions that will work at least this well.
Photo credits:
- Library of Congress – Wagons and camera of Sam A. Cooley, U.S. photographer, Department of the South;
- My family photos – My Great Great Grandfather, Great Grandfather and two Great Uncles. Given the format of the image, this picture was likely taken with a Kodak Brownie in the late 1890s.
Mike, those guys looks just like you!
Yes Mike, well written. Thanks for the DOTS mention.