 |
Preface
In 1996, the Commission on Preservation and Access and the Research
Libraries Group issued the final report of the Task Force on the
Archiving of Digital Information. Chaired by John Garrett and Donald
Waters, the task force spent over a year analyzing the problem, considering
options, consulting with others around the world, and formulating
a series of recommendations. The conclusion reached by the impressive
group of 21 experts was alarmingthere is, at present, no way
to guarantee the preservation of digital information. And it is not
simply a technical problem. A serious commitment to preserving digital
information requires a legal environment that enables preservation.
It also means that specific organizationslibraries, government
agencies, corporationsmust take responsibility for preservation
by enacting new policies and creating the economic means to secure
survival of this generation's knowledge into the future.
The Council on Library and Information Resources, which absorbed
the Commission on Preservation and Access in July 1997, continues
to search for answers to the troubling question of how digital information
will be preserved. Raising public awareness is an important goal,
and we have pursued it vigorously. Since the publication of the task
force report in 1996, we have spoken to library and scholarly groups
here and abroad, published a number of papers, and produced an hour-long
documentary film on the subject for broadcast on public television.
The film especially has made an impression, and several observers
have wondered why we have spent so much time in describing the problems
and so little in finding solutions.
In fact, we have also been seeking solutions, and the present paper
by Jeff Rothenberg is the first in a series resulting from our efforts.
Each paper in the series will propose an approach to the preservation
of digital information. Each approach addresses the important parts
of the problem. We believe that it is best to assemble as many ideas
as possible, to place them before a knowledgeable audience, and to
stimulate debate about their strengths and weaknesses as solutions
to particular preservation problems.
Jeff Rothenberg is a senior research scientist of the RAND Corporation.
His paper is an important contribution to our efforts.
Executive Summary
There is as yet no viable long-term strategy to ensure that digital
information will be readable in the future. Digital documents are
vulnerable to loss via the decay and obsolescence of the media on
which they are stored, and they become inaccessible and unreadable
when the software needed to interpret them, or the hardware on which
that software runs, becomes obsolete and is lost. Preserving digital
documents may require substantial new investments, since the scope
of this problem extends beyond the traditional library domain, affecting
such things as government records, environmental and scientific baseline
data, documentation of toxic waste disposal, medical records, corporate
data, and electronic-commerce transactions.
This report explores the technical depth of the problem of long-term
digital preservation, analyzes the inadequacies of a number of ideas
that have been proposed as solutions, and elaborates the emulation
strategy. The central idea of the emulation strategy is to emulate
obsolete systems on future, unknown systems, so that a digital document's
original software can be run in the future despite being obsolete.
Though it requires further research and proof of feasibility, this
approach appears to have many advantages over the other approaches
suggested and is offered as a promising candidate for a solution
to the problem of preserving digital material far into the future.
Since this approach was first outlined, it has received considerable
attention and, in the author's view, is the only approach yet suggested
to offer a true solution to the problem of digital preservation.
The long-term digital preservation problem calls for a long-lived
solution that does not require continual heroic effort or repeated
invention of new approaches every time formats, software or hardware
paradigms, document types, or recordkeeping practices change. The
approach must be extensible, since we cannot predict future changes,
and it must not require labor-intensive translation or examination
of individual documents. It must handle current and future documents
of unknown type in a uniform way, while being capable of evolving
as necessary. Furthermore, it should allow flexible choices and tradeoffs
among priorities such as access, fidelity, and ease of document management.
Most approaches that have been suggested as solutionsprinting
digital documents on paper, relying on standards to keep them readable,
reading them by running obsolete software and hardware preserved
in museums, or translating them so that they "migrate" into forms
accessible by future generations of softwareare labor-intensive
and ultimately incapable of preserving digital documents in their
original forms.
The best way to satisfy the criteria for a solution is to run the
original software under emulation on future computers. This is the
only reliable way to recreate a digital document's original functionality,
look, and feel. Though it may not be feasible to preserve every conceivable
attribute of a digital document in this way, it should be possible
to recreate the document's behavior as accurately as desiredand
to test this accuracy in advance.
The implementation of this emulation approach involves: (1) developing
generalizable techniques for specifying emulators that will run on
unknown future computers and that capture all of those attributes
required to recreate the behavior of current and future digital documents;
(2) developing techniques for savingin human-readable formthe
metadata needed to find, access, and recreate digital documents,
so that emulation techniques can be used for preservation; and (3)
developing techniques for encapsulating documents, their attendant
metadata, software, and emulator specifications in ways that ensure
their cohesion and prevent their corruption. The only assumption
that this approach makes about future computers is that they will
be able to perform any computable function and (optionally) that
they will be faster and/or cheaper to use than current computers.
Next Previous
Return to CLIR Home Page >> |