|

Into the Future: On the Preservation of Knowledge in the
Electronic Age
Addressing our digital memory crisis begins with dialogue.
The following discussion paper is prepared by the Council on
Library and Information Resources.
If William Shakespeare had written Hamlet
on a word processor, or...
If Thomas Jefferson had saved his drafts
of the Declaration of Independence
with a computer text editor, or...
If Alexander Graham Bell had documented
his experiments with the telephone
on floppy disks, or...
If Leonardo da Vinci had used a computer graphics system
to create the Mona Lisa...
Would Their Great Achievements
Still Be Available To Us Today?
Unless they copied their work to a more durable medium, the
answer is no. Although digital technology has greatly increased
possibilities for access, it is not ideal for preservation
purposes. Digital storage, though expedient and efficient,
is among the least stable media of all time. At most, digitally
preserved documents stored on an exceptional quality CD-ROM
at room temperature might survive 50 years. Twenty years is
the maximum that has been achieved in tests for magnetic tape.
And if they last that long, there is no guarantee there will
be hardware and software to retrieve them.
Contents:
Historically, as recording media have become more efficient
and capable of accommodating greater volumes of data, they
have become less stable. Relatively simple messages carved
onto stone thousands of years ago had a very long life. Papyrus,
less stable than rock, was nonetheless durable and provided
an appropriate medium for an expanding documentation of knowledge.
Since the mid-nineteenth century, huge amounts of data have
been recorded on widely accessible, inexpensive, acid-based
paper a medium that is slowly destroying itself, together
with the information printed on it. By the early 20th century,
audio was captured on acoustical disks and moving images on
film. These technologies joined in mid-century by recording
on magnetic tape became known as analog systems because
they are direct analogies of what is recorded. Analog media
functions by making a single continuous record of the subject
matter, and analog formats require large storage space for
relatively small amounts of information.
When digital technology emerged, it offered a remarkably efficient
new means of storing and accessing information. By breaking
analog information into tiny coded elements bits and
bytes it became possible to store and access information
of colossal proportion using media that took up very little
space.
Many are so pleased with the storage and access capabilities
of digital technology that they overlook the need for preservation.
According to National Media Laboratory (NML) research, the
life expectancy of digital documents is rarely, if ever, comparable
to that of paper. And in almost all cases, the hardware or
software required to retrieve documents becomes obsolete in
just a few years.
Before addressing the specific problems of digital preservation,
it is important to communicate the scale of this crisis. Every
imaginable form of information chronicling the knowledge and
history of our time has been preserved digitally. The challenge
of protecting this immense amount of data promises to get worse
as the rate of digital storage expands. Consider that, by the
year 2000, an estimated 75% of all federal transactions will
be handled electronically. Or that when President Clinton leaves
office, his administration will hand over eight million electronic
files to the National Archives. Those administration files
are but a minuscule portion of total government-generated information.
Add more and more government digital records to those generated
by the private sector for example, health records, insurance
information, financial data, even toxic-waste dump-site information and
the magnitude of the problem becomes evident. The potential
danger to the world's storehouse of digital information is
immense, unless there is active intervention to change the
course of information technology. It would be impossible to
attach a dollar figure to the staggering financial consequences
of lost or inaccessible data.
Quantifying the magnitude of the digital preservation crisis
provides only one dimension of the problem. Consider the broader
consequences. When a culture loses its memory, it loses its
identity. As digital records of our culture, science, history,
and government disintegrate or become unretrievable, we leave
an incomplete, defective legacy to future generations.
In the case of democratic government, the implications could
be serious. Democracy is based on accountability, which makes
responsible record-keeping mandatory. Libraries and archives
have served as society's guarantee of an intellectual audit
trail. This trail will be fragmented or broken by relying exclusively
on digital storage.
There are no definitive figures on how much digital information
is lost or unretrievable. While government and industry alike
worry about the scale of the coming crisis, few are willing
to admit that the problem exists.
Visit a brightly lit, secure, and environmentally controlled
facility where great numbers of magnetic files are stored,
and you will see no obvious sign of a threat. It is only when
the records are called for that they reveal the process of
physical decay to which they are subject or reveal nothing
at all because they cannot be opened and read on the latest
generation of hardware. Slowly and steadily, however, examples
of the problem are surfacing.
- Military files, including POW and MIA data from the Vietnam
War, were nearly lost forever because of errors and omissions
contained in the original digital records.
- Ten to 20 percent of vital data recorded on magnetic tapes
from the Viking Mars mission have significant errors, because,
as Jet Propulsion Laboratory technicians now realize, the
magnetic tape on which they are stored is "a disaster for
an archival storage medium."
- Federal law requires the Census Bureau to retain records
on "permanent" storage media. Data from the 1960 census were
recorded on magnetic tape (perceived as "permanent" at the
time). Sixteen years later, when the National Archives asked
the Census Bureau to provide parts of the 1960 data that
had "long-term historical value," the Bureau took three years
to furnish the records because it no longer had machines
capable of reading the data.
- Earth observation data gathered by satellite in the 1970's
were recently identified as critical for establishing a time
line of changes in South America's fragile Amazon basin.
The National Research Council reports the information is
lost on the now-obsolete tapes on which the data were written.
- In the late 1960's, New York State and Cornell University
conducted an inventory of land use and natural resources.
Twenty years later, the New York State Archives obtained
copies of the tapes but was unable to read them because the
programs and customized software to run the system no longer
existed. Researchers wanting to do comparative land-use studies
had to re-digitize and re-key all of the original data.
- "Mr. Watson-come here-I want to see you" and "What hath
God wrought?" famous sentences in the history of the
telephone and telegraph have been preserved for us.
But we do not know the content of the first e-mail message
or the first Web page, because there is no record of them.
Unfortunately, digital communications originated in laboratories
that had greater regard for discovery than for preservation;
thus, much of the history of the Internet is simply lost.
Each of three elements of digital technology media,
hardware, and programming presents serious threats to
the future access of digitally stored information:
Media: Archivists and librarians, responsible for maintaining
a continuous record of human activity and thought for present
and future generations, view the life expectancy of storage
media in terms of hundreds of years. High-quality non-acidic
buffered paper, for example, is expected to last up to 500
years. Archival-quality silver microfilm can last up to 200
years. Yet, some magnetic tape the most common digital
storage medium can become unreliable for archival storage
after only five years. Likewise, many optical disk media of
average quality, including CD-ROM's, are not reliable after
five years. Optical and magnetic media are currently the two
most common repositories of digital information.
- Optical disks, unless stored properly, are susceptible
to a potential breakdown between their reflective backing where
data are stored and their protective, transparent surface.
Also, with improper storage, the transparent surface of optical
disks could become cloudy, hampering the ability of machinery
to read the disks. Surface scratches are another hindrance
to the retrieval of data from optical disks.
There is "a wide variability in the stability of CD-ROM
disks produced by various manufacturers," according to a
study by the National Media Laboratory. NML put CD-ROM's
through an aging process in various environments, determining
that "some disks were unplayable after less than 100 hours...Other
manufacturers' products lasted over 3,000 hours." NML determined
that average quality CD-ROM's remain archivally sound up
to only 10 years under proper storage conditions.
Other types of optical media, such as WORM (Write once,
ready many times), are more reliable than CD-ROM's and guaranteed
for up to 100 years by some manufacturers. However, "considering
the explosive growth of CD-ROM and CD-R technologies, it
is doubtful that WORM technology will be viable in 10 more
years. WORM disks will undoubtedly outlive WORM technology," says
NML. In other words, the machines needed to play these disks
may not survive.
- Magnetic tapes can become brittle, causing magnetic
coating to separate from its backing. Also, magnetic tapes
are susceptible to interference from magnetic forces in the
environment that may cause errors and omissions in recorded
data.
Studies by the National Media Laboratory have determined
that average-quality magnetic tape, kept at a constant room
temperature, becomes unreliable as a storage medium in five
years or less.
The National Institute of Standards and Technology (NIST)
estimates "the longevity of modern magnetic tape to be about
20 years under ideal storage conditions." The longevity of
other electronic media, like floppy disks, or media improperly
stored, may be considerably less.
The House Committee on Government Records, quoting NIST,
cautions that "data deterioration [on magnetic tape or disks]
can easily be caused by physical damage such as mishandling,
contamination, and poor storage." Recommended procedures
are recopying tapes every 10 years, "exercising" stored tape
by rewinding it annually, and separately storing backup copies.
Given the volume of digital records, the time and expense
in following these procedures would be prohibitive for many.
Hardware: Rapid advances in semiconductors and microprocessors
(Moore's Law that microprocessor speed will be doubled every
18 months has gone unchallenged for nearly 30 years) promise
that more computer hardware will go to the graveyard of obsolete
equipment. Unfortunately, most replacement hardware is incompatible
with media recorded on earlier equipment. If a digital file
is recorded on an optical or magnetic disk and there are
no disk drives to read that particular size and format, the
information on that disk is lost.
Programming: Software advances are occurring as fast
as hardware advances, bringing rapid obsolescence to countless
programs and languages. Unless documentation for out-of-date
software is maintained into the future, stored data become
as unreadable and as useless as Egyptian hieroglyphics before
the deciphering of the Rosetta Stone.
One of the most important measures we can take now to stem
this developing crisis is to give careful consideration to
the long-term preservation needs of digital information when
it is first created. Not every piece of information warrants
long-term preservation. Librarians and archivists custodians
of the world's memory are urging those who generate
data to set priorities and choose the portions that deserve
permanent preservation. But choosing is only the first step.
What is then needed and what does not currently exist is
the technology that will capture the information for long-term
use and keep it readily accessible.
In the past, librarians and archivists carefully selected
the paper-based documents that seemed likely to have long-term
value. That relatively small portion of materials received
appropriate preservation treatment, usually after careful
assessment of the condition of the paper. Selection of digital
materials for long-term preservation is much more difficult,
because obsolete hardware and software can render such materials
unusable, even though the material itself is in good condition.
Partnerships between the stewards of collections and the
developers of technology are critically important. The private
and public sectors must join in developing and adopting the
common standards that are essential to universal information
storage and retrieval. In addition, the legal infrastructure
that will ensure universal access to public domain information
must be carefully built.
We must invest now in research and development to assure
the permanent availability of digital records and the preservation
of knowledge into the future.
analog: in media, a format which captures and presents
information in a continuous signal or stream. Unlike digital
formats, which encode information into discrete bits, analog
formats are continuous. Traditional analog formats include
paper, photographs, film, video, phonographs, and magnetic
tape, all of which are readable without additional interpretation
by computer software and hardware.
binary code: the basic level of digital electronic
records consisting of bits (individual binary digits recorded
as ones and zeros) making up bytes (a set of eight binary
digits).
CD (Compact Disk): an optical disk on which digital
text, audio, video, and graphics data is stored. Most CDís
are read-only (CD-ROM: read only memory), although recordable
CDís (CD-Rís) are available that require record-capable hardware.
crash: a catastrophic circumstance in which software,
hardware, or media that store data cease to function, making
information inaccessible.
digital: any information (text, graphics, audio,
and video) that is translated into binary code.
digital media: physical objects on which digital
information is stored (e.g., magnetic tape, magnetic and
optical disks, etc.), or collections of digital objects.
magnetic disks (including "floppy" and hard disks):
a common digital information storage medium similar to magnetic
tape.
magnetic tape: a common medium for analog and digital
information storage. In analog use, tape is used to store
audio and video. In digital applications, tape is used to
store text, data bases, graphics, audio, and video. Magnetic
tape consists of a plastic ribbon backing coated on one side
with an adhesive material containing particles of iron or
other material that can be magnetized to record information.
migration: the periodic transfer of digital materials
from one hardware/software configuration to another, or from
one generation of computer technology to a subsequent generation.
open standards: specifications for computer system
components that are proposed, defined, and maintained through
public processes, and that enable hardware and software produced
by different manufacturers to operate together to provide
ready access to digitally stored information.
optical disks: any of several disk formats in which
digital data is etched onto a reflective surface and read
using a concentrated light beam. Optical formats include
CD-ROM (read only memory), CD-R (recordable CD), DVD (digital
versatile disk), and WORM (write once, read many times).
refreshing: a procedure used to maximize the life
expectancy of magnetic tapes and disks. In magnetic tape,
refreshing involves unspooling and rewinding tapes to relieve
stresses. In addition, data on the tapes are transcribed
and rewritten to refresh the magnetic signal and prevent
data loss. In magnetic disks, the term refers only to the
rerecording process.
SGML (Standard Generalized Markup Language): a standard
coding system for creating documents that can be translated
by different software into formats, links, graphics, etc.
A commonly used type of SGML is Hypertext Markup Language
(HTML).
This brochure funded by:
Alfred P. Sloan Foundation
National Endowment For The Humanities
Xerox Corporation
for Council on Library and Information Resources
1755 Massachusetts Avenue, N.W., Suite 500
Washington, D.C. 20036
Telephone: 202.939.4750
Telefacsimile: 202.939.4765
Web site: www.clir.org
|