Addressing our digital memory crisis begins with dialogue. The following discussion paper is prepared by the Council on Library and Information Resources.
If William Shakespeare had written Hamlet
on a word processor, or...
If Thomas Jefferson had saved his drafts
of the Declaration of Independence
with a computer text editor, or...
If Alexander Graham Bell had documented
his experiments with the telephone
on floppy disks, or...
If Leonardo da Vinci had used a computer graphics system
to create the Mona Lisa...
Would Their Great Achievements Still Be Available To Us Today?
Unless they copied their work to a more durable medium, the answer is no. Although digital technology has greatly increased possibilities for access, it is not ideal for preservation purposes. Digital storage, though expedient and efficient, is among the least stable media of all time. At most, digitally preserved documents stored on an exceptional quality CD-ROM at room temperature might survive 50 years. Twenty years is the maximum that has been achieved in tests for magnetic tape. And if they last that long, there is no guarantee there will be hardware and software to retrieve them.
Historically, as recording media have become more efficient and capable of accommodating greater volumes of data, they have become less stable. Relatively simple messages carved onto stone thousands of years ago had a very long life. Papyrus, less stable than rock, was nonetheless durable and provided an appropriate medium for an expanding documentation of knowledge.
Since the mid-nineteenth century, huge amounts of data have been recorded on widely accessible, inexpensive, acid-based paper — a medium that is slowly destroying itself, together with the information printed on it. By the early 20th century, audio was captured on acoustical disks and moving images on film. These technologies — joined in mid-century by recording on magnetic tape — became known as analog systems because they are direct analogies of what is recorded. Analog media functions by making a single continuous record of the subject matter, and analog formats require large storage space for relatively small amounts of information.
When digital technology emerged, it offered a remarkably efficient new means of storing and accessing information. By breaking analog information into tiny coded elements — bits and bytes — it became possible to store and access information of colossal proportion using media that took up very little space.
Many are so pleased with the storage and access capabilities of digital technology that they overlook the need for preservation. According to National Media Laboratory (NML) research, the life expectancy of digital documents is rarely, if ever, comparable to that of paper. And in almost all cases, the hardware or software required to retrieve documents becomes obsolete in just a few years.
Before addressing the specific problems of digital preservation, it is important to communicate the scale of this crisis. Every imaginable form of information chronicling the knowledge and history of our time has been preserved digitally. The challenge of protecting this immense amount of data promises to get worse as the rate of digital storage expands. Consider that, by the year 2000, an estimated 75% of all federal transactions will be handled electronically. Or that when President Clinton leaves office, his administration will hand over eight million electronic files to the National Archives. Those administration files are but a minuscule portion of total government-generated information. Add more and more government digital records to those generated by the private sector — for example, health records, insurance information, financial data, even toxic-waste dump-site information — and the magnitude of the problem becomes evident. The potential danger to the world's storehouse of digital information is immense, unless there is active intervention to change the course of information technology. It would be impossible to attach a dollar figure to the staggering financial consequences of lost or inaccessible data.
Quantifying the magnitude of the digital preservation crisis provides only one dimension of the problem. Consider the broader consequences. When a culture loses its memory, it loses its identity. As digital records of our culture, science, history, and government disintegrate or become unretrievable, we leave an incomplete, defective legacy to future generations.
In the case of democratic government, the implications could be serious. Democracy is based on accountability, which makes responsible record-keeping mandatory. Libraries and archives have served as society's guarantee of an intellectual audit trail. This trail will be fragmented or broken by relying exclusively on digital storage.
There are no definitive figures on how much digital information is lost or unretrievable. While government and industry alike worry about the scale of the coming crisis, few are willing to admit that the problem exists.
Visit a brightly lit, secure, and environmentally controlled facility where great numbers of magnetic files are stored, and you will see no obvious sign of a threat. It is only when the records are called for that they reveal the process of physical decay to which they are subject — or reveal nothing at all because they cannot be opened and read on the latest generation of hardware. Slowly and steadily, however, examples of the problem are surfacing.
- Military files, including POW and MIA data from the Vietnam War, were nearly lost forever because of errors and omissions contained in the original digital records.
- Ten to 20 percent of vital data recorded on magnetic tapes from the Viking Mars mission have significant errors, because, as Jet Propulsion Laboratory technicians now realize, the magnetic tape on which they are stored is "a disaster for an archival storage medium."
- Federal law requires the Census Bureau to retain records on "permanent" storage media. Data from the 1960 census were recorded on magnetic tape (perceived as "permanent" at the time). Sixteen years later, when the National Archives asked the Census Bureau to provide parts of the 1960 data that had "long-term historical value," the Bureau took three years to furnish the records because it no longer had machines capable of reading the data.
- Earth observation data gathered by satellite in the 1970's were recently identified as critical for establishing a time line of changes in South America's fragile Amazon basin. The National Research Council reports the information is lost on the now-obsolete tapes on which the data were written.
- In the late 1960's, New York State and Cornell University conducted an inventory of land use and natural resources. Twenty years later, the New York State Archives obtained copies of the tapes but was unable to read them because the programs and customized software to run the system no longer existed. Researchers wanting to do comparative land-use studies had to re-digitize and re-key all of the original data.
- "Mr. Watson-come here-I want to see you" and "What hath God wrought?" — famous sentences in the history of the telephone and telegraph — have been preserved for us. But we do not know the content of the first e-mail message or the first Web page, because there is no record of them. Unfortunately, digital communications originated in laboratories that had greater regard for discovery than for preservation; thus, much of the history of the Internet is simply lost.
Each of three elements of digital technology — media, hardware, and programming — presents serious threats to the future access of digitally stored information:
Media: Archivists and librarians, responsible for maintaining a continuous record of human activity and thought for present and future generations, view the life expectancy of storage media in terms of hundreds of years. High-quality non-acidic buffered paper, for example, is expected to last up to 500 years. Archival-quality silver microfilm can last up to 200 years. Yet, some magnetic tape — the most common digital storage medium — can become unreliable for archival storage after only five years. Likewise, many optical disk media of average quality, including CD-ROM's, are not reliable after five years. Optical and magnetic media are currently the two most common repositories of digital information.
- Optical disks, unless stored properly, are susceptible to a potential breakdown between their reflective backing where data are stored and their protective, transparent surface. Also, with improper storage, the transparent surface of optical disks could become cloudy, hampering the ability of machinery to read the disks. Surface scratches are another hindrance to the retrieval of data from optical disks.
There is "a wide variability in the stability of CD-ROM disks produced by various manufacturers," according to a study by the National Media Laboratory. NML put CD-ROM's through an aging process in various environments, determining that "some disks were unplayable after less than 100 hours...Other manufacturers' products lasted over 3,000 hours." NML determined that average quality CD-ROM's remain archivally sound up to only 10 years under proper storage conditions.
Other types of optical media, such as WORM (Write once, ready many times), are more reliable than CD-ROM's and guaranteed for up to 100 years by some manufacturers. However, "considering the explosive growth of CD-ROM and CD-R technologies, it is doubtful that WORM technology will be viable in 10 more years. WORM disks will undoubtedly outlive WORM technology," says NML. In other words, the machines needed to play these disks may not survive.
- Magnetic tapes can become brittle, causing magnetic coating to separate from its backing. Also, magnetic tapes are susceptible to interference from magnetic forces in the environment that may cause errors and omissions in recorded data.
Studies by the National Media Laboratory have determined that average-quality magnetic tape, kept at a constant room temperature, becomes unreliable as a storage medium in five years or less.
The National Institute of Standards and Technology (NIST) estimates "the longevity of modern magnetic tape to be about 20 years under ideal storage conditions." The longevity of other electronic media, like floppy disks, or media improperly stored, may be considerably less.
The House Committee on Government Records, quoting NIST, cautions that "data deterioration [on magnetic tape or disks] can easily be caused by physical damage such as mishandling, contamination, and poor storage." Recommended procedures are recopying tapes every 10 years, "exercising" stored tape by rewinding it annually, and separately storing backup copies. Given the volume of digital records, the time and expense in following these procedures would be prohibitive for many.
Hardware: Rapid advances in semiconductors and microprocessors (Moore's Law that microprocessor speed will be doubled every 18 months has gone unchallenged for nearly 30 years) promise that more computer hardware will go to the graveyard of obsolete equipment. Unfortunately, most replacement hardware is incompatible with media recorded on earlier equipment. If a digital file is recorded on an optical or magnetic disk and there are no disk drives to read that particular size and format, the information on that disk is lost.
Programming: Software advances are occurring as fast as hardware advances, bringing rapid obsolescence to countless programs and languages. Unless documentation for out-of-date software is maintained into the future, stored data become as unreadable and as useless as Egyptian hieroglyphics before the deciphering of the Rosetta Stone.
One of the most important measures we can take now to stem this developing crisis is to give careful consideration to the long-term preservation needs of digital information when it is first created. Not every piece of information warrants long-term preservation. Librarians and archivists — custodians of the world's memory — are urging those who generate data to set priorities and choose the portions that deserve permanent preservation. But choosing is only the first step. What is then needed — and what does not currently exist — is the technology that will capture the information for long-term use and keep it readily accessible.
In the past, librarians and archivists carefully selected the paper-based documents that seemed likely to have long-term value. That relatively small portion of materials received appropriate preservation treatment, usually after careful assessment of the condition of the paper. Selection of digital materials for long-term preservation is much more difficult, because obsolete hardware and software can render such materials unusable, even though the material itself is in good condition.
Partnerships between the stewards of collections and the developers of technology are critically important. The private and public sectors must join in developing and adopting the common standards that are essential to universal information storage and retrieval. In addition, the legal infrastructure that will ensure universal access to public domain information must be carefully built.
We must invest now in research and development to assure the permanent availability of digital records and the preservation of knowledge into the future.
analog: in media, a format which captures and presents information in a continuous signal or stream. Unlike digital formats, which encode information into discrete bits, analog formats are continuous. Traditional analog formats include paper, photographs, film, video, phonographs, and magnetic tape, all of which are readable without additional interpretation by computer software and hardware.
binary code: the basic level of digital electronic records consisting of bits (individual binary digits recorded as ones and zeros) making up bytes (a set of eight binary digits).
CD (Compact Disk): an optical disk on which digital text, audio, video, and graphics data is stored. Most CDís are read-only (CD-ROM: read only memory), although recordable CDís (CD-Rís) are available that require record-capable hardware.
crash: a catastrophic circumstance in which software, hardware, or media that store data cease to function, making information inaccessible.
digital: any information (text, graphics, audio, and video) that is translated into binary code.
digital media: physical objects on which digital information is stored (e.g., magnetic tape, magnetic and optical disks, etc.), or collections of digital objects.
magnetic disks (including "floppy" and hard disks): a common digital information storage medium similar to magnetic tape.
magnetic tape: a common medium for analog and digital information storage. In analog use, tape is used to store audio and video. In digital applications, tape is used to store text, data bases, graphics, audio, and video. Magnetic tape consists of a plastic ribbon backing coated on one side with an adhesive material containing particles of iron or other material that can be magnetized to record information.
migration: the periodic transfer of digital materials from one hardware/software configuration to another, or from one generation of computer technology to a subsequent generation.
open standards: specifications for computer system components that are proposed, defined, and maintained through public processes, and that enable hardware and software produced by different manufacturers to operate together to provide ready access to digitally stored information.
optical disks: any of several disk formats in which digital data is etched onto a reflective surface and read using a concentrated light beam. Optical formats include CD-ROM (read only memory), CD-R (recordable CD), DVD (digital versatile disk), and WORM (write once, read many times).
refreshing: a procedure used to maximize the life expectancy of magnetic tapes and disks. In magnetic tape, refreshing involves unspooling and rewinding tapes to relieve stresses. In addition, data on the tapes are transcribed and rewritten to refresh the magnetic signal and prevent data loss. In magnetic disks, the term refers only to the rerecording process.
SGML (Standard Generalized Markup Language): a standard coding system for creating documents that can be translated by different software into formats, links, graphics, etc. A commonly used type of SGML is Hypertext Markup Language (HTML).
This brochure funded by:
Alfred P. Sloan Foundation
National Endowment For The Humanities
for Council on Library and Information Resources
1755 Massachusetts Avenue, N.W., Suite 500
Washington, D.C. 20036
Web site: www.clir.org