Moving Forward
In the short term, there are several actions that
are within reach for both data creators and data repositories that
will advance the preservation agenda. For the creators, these actions
include the following:
- Work with libraries when beginning a project
- Use standard and, when possible, nonproprietary formats
- Declare the intended use and audience
- Declare intended longevity
For the repositories, such actions include the following:
- Work with data creators during all phases of the creation
- Declare policies and capabilities for archiving differing formats
- Take materials into custody for preservation experiments
Beyond these actions, digital scholars should think
deeply about developing an informatics for their discipline, as has
happened in some data-intensive sciences, so that they are able to
create digital objects that share vocabularies and descriptive markup,
facilitate shared access to information resources, and allow ready
repurposing for teaching and scholarship. Teachers should ensure
that their students master the skills needed to use the new technologies.
Instruction in digital information literacy and research skills should
be as vital a part of a student's training as is teaching how to
work in primary sources or cite authorities appropriately. Research
divisions of the learned societies can provide leadership in this
area.
Libraries can initiate partnerships with scholars
on campus and with learned societies and their publishers to share
knowledge and agree on common approaches to data creation and preservation.
They can develop transparent digital preservation policies and make
them accessible on their Web sites. They can develop depository programs
that promise not necessarily to preserve flawlessly in perpetuity
but rather to partner with data depositors in experiments that take
in formats favored by disciplines and knowledge communities, perform
risk assessments on those file formats, explore approaches that reduce
format vulnerabilities, and share the results of that work with other
data communities.
Looking Ahead
The current lack of provision for the responsible
creation, curation, and retention of research data is highlighted
in the National Science Foundation's report on the science and engineering
information infrastructure, which addresses the promise of computing
capabilities to transform even further and more radically the conduct
of basic and applied research (NSF 2003). This report has implications
not only for scientific and engineering data; a similar argument
could be mounted for the creation, curation, and preservation of
nonscientific research data. There is no agency in the humanities
with a mission, funding, or standing comparable to that of the National
Academy of Sciences. The opportunities for articulating the problem
of preserving nonscientific research data are therefore fewer, and,
even when persuasive arguments are made, there are far fewer resources
to commit to finding and funding solutions.
There are many barriers to digital preservation at
this early stage in the development of digital information technologies,
but they can be summed up in one phrase: lack of infrastructure.
In the academy, and especially within humanities faculties, many
scholars, teachers, and students will continue to look to libraries
and archives to lead preservation efforts and to make information
of high research value available now and into the future. The well-known
preprint archive for high-energy physics, arXiv.org, moved from its
home at a laboratory in Los Alamos to Cornell University because
the lab did not see maintaining a historical record for access in
the future as part of its mission. Even as the perception of the
library's value for providing access to information is declining
among some on campuses, the value that faculty place on the preservation
function of libraries remains high (JSTOR 2002).
The research community must begin to grapple seriously
with the nature of resources stewardship in the digital age. What
worked in the analog realm might not work as well in the future.
One perspective in the heated debate on electronic academic publishing
holds that the technology allows radical changes in the creation
and distribution of scholarship. Others sense that while technology
creates opportunities for doing business better (for example, lowering
publishing and distribution costs), it also has many disadvantages
(the expenses of creating in standard formats and preservation are
two big ones). Some libraries are trying to become points of dissemination
for scholarly literature in a way that differs radically from their
role in the distribution system of print resources.
Libraries, particularly their special collections
and archives units, have been the traditional custodians of primary
sources, and it is natural to expect that they should continue to
play that role. However, while libraries and archives have the curatorial
expertise needed to fulfill their roles in the digital arena, they
generally lack the technical infrastructure to support the key functions
of digital preservation. There is some debate about whether it is
advisable, or even possible, for every institution in higher education,
or even the largest institutions, to develop the full range of services
needed for digital preservation. (For commonly agreed-upon minimum
standards for long-term repositories, see Appendix 2.) The digital
librarians and archivists who are most deeply engaged in building
repositories and preservation services agree that repositories are
difficult and expensive to build and maintain. They argue cogently
that such repositories will be few and will serve many users, including
other libraries. In a distributed network, there do not need to be
many.
Others argue that every major university can and should
have its own digital repository, although the reasons adduced for
having one usually relate more to intellectual property matters surrounding
publication than to long-term preservation. A white paper commissioned
by the Scholarly Publishing and Academic Resources Coalition (SPARC)
expands on one type of repository, designed to be "a component in
a restructured scholarly publishing model . . . [and] . . . tangible
embodiment of institutional quality" (Crow 2002). The paper advocates
for institutional repositories to transform scholarly publishing
by allowing libraries to compete with commercial publishers online,
and to increase the prestige of the university and build brand identity
by showcasing the intellectual property of its faculty. The paper
suggests that the disaggregation of functions in the networked environment
allows libraries to develop consortia to build and maintain repositories
for any number of purposes, including preservation. The SPARC model
of repository is, however, intended to be complemented by repositories
that do stake a claim for preservation. A reliable chain of referencing
in scholarly publishing and the promise of scholarship's persistence
into the future are indispensable for the progress of science and
humanities.
One challenge that remains is what happens to those
scholarly resources created outside the purview of a large, well-funded
research institution with a preservation mandate, such as those seated
at the Dibner Institute and George Mason University. These resources
share many of the characteristics of other noncommercial assets (or
commercially produced assets that have exhausted their profitability)
that can quickly become orphans in the world. In this way, they share
the fate of most special collections.
Regardless of how this debate turns out, it is clear
from the viewpoint of systems design that a robust network of repositories
and services for long-term preservation of digital library objects
favors a disaggregation of functions and does not require that each
preserving institution have its own bit repository. The distributed
architecture of preservation that LC proposes in its NDIIPP plan
is one that will encourage even the smallest preservation and curatorial
institutions to participate because it will allow them to bring their
particular expertise to bear on some aspect of stewardship but not
require that they replicate all aspects of preservation from bit
repository to collections and end-user services. Such a system will
address one need already apparent in the digital realm: the need
to have in place an infrastructure that will allow both an aggressive
rescue function to save endangered information assets and the ability
to serve individual institutions, no matter the size, that are conscientious
custodians of their digital collections.
The Responsibility for Stewardship
How will we pay for such an infrastructure, and how
do we move beyond the incentives born of enlightened self-interest
that we see in institutions managing their own information assets?
In the long run, digital technology will force all
engaged in the research enterprisefrom university president
to graduate student, from library director to reference librarianto
rethink stewardship. Like all big challenges, the debate about information
stewardship in this transformed landscape should begin with a simple
proposition: Everyone who has a stake in access to digital information
has a stake in the preservation of digital data. In higher education,
that means the debate would be joined by all, with discussions taking
place across and among campuses.
It is a debate in which university and college administrators
and governors must play a visible role. In many ways, the issue of
preservationof the long-term care of information assets whether
or not they have commercial potential or are crucial for lucrative
or well-funded areas of researchis the dark side of the debate
raging on campuses about scholarly communication, or, to be more
precise, about publishing. But underlying the integrity and value
of published scientific and scholarly literature are the deep and
broad expanses of unpublished data and primary sources on which scientific
and humanistic inquiry are based. To continue investing heavily in
creating digital information assets without shoring up their long-term
accessibility is like building castles on sand.
Today, we can expect that institutions will pay more
attention to securing their own information assets into the future,
even if that means using outside preservation services. We can press
learned societies and the scholarly disciplines they represent to
declare and act on their responsibilities to the information sources
crucial to their own work. We can ask that all members of the research
community not only look after their own near-term interests but also
take the long view of the resources on which their professions depend.
In the end, this debate affects not only research institutions and
their constituents but also the public at large. It is the public
that supports a vast research enterprise through federal tax structures
that subsidize foundations and private as well as public educational
institutions. Those tax structures and the stream of funding that
goes into research through federal agencies have been created because
our country's Founders believed that the creation and dissemination
of information and knowledge will lead to progress in the arts and
sciences. It is not just digital information that is at risk if the
academy does not act. It is also the compact between the public and
the research-and-development infrastructure that the public supports.
Next Previous
Return to CLIR Home Page >>