Council on Library and Information Resources

Username (email)

Password

Good Archives Make Good Scholars: Reflections on Recent Steps Toward the Archiving of Digital Information

Good Archives Make Good Scholars: Reflections on Recent Steps Toward the Archiving of Digital Information

previous section >> | report contents >>

 

Donald Waters


"Good fences make good neighbors." This famous aphorism from Robert Frost's poem "Mending Wall" suggests the title and subject of this paper.1 Let me begin by explaining the relevance of the poem to the topic of the archiving of digital information.

A Preservation Parable

"Mending Wall" is a parable in the form of a poem. Wonderfully crafted, it can be read on many levels. It is about boundaries and territoriality, the conflict between primitive impulse and modern reflection, and the nature of ritual and work.2 But at another level, "Mending Wall" is simply about the preservation of a shared resource—a common wall that each year two neighbors must join together to rebuild. Why does it need repair? As the opening line famously puts it, "Something there is that doesn't love a wall."

The narrator of the poem identifies two sources of damage: natural causes, such as the heaving of stones that results from the freezing and thawing of the earth, and deliberate human acts, such as the attempts of hunters and their dogs to flush out their prey from hiding in the wall. Whatever the cause, it is the mending that matters to the narrator, who says:

The gaps I mean,
No one has seen them made or heard them made,
But at spring mending-time we find them there.
I let my neighbor know beyond the hill;
And on a day we meet to walk the line
And set the wall between us once again.

The narrator then vividly describes the mending process. The fieldstones are heavy and variously shaped; they often do not fit well together. He says, "We wear our fingers rough with handling them." All of this is a hard, but straightforward, technical process. Then the neighbors come to a grove of trees, and the narrator asks why do we need to mend the wall here? The taciturn New England reply of the neighbor is simple: "Good fences make good neighbors."

From this point, the poem takes a darker turn as the conflict between the narrator and his neighbor becomes apparent. The narrator probes deeper into the reasons why neighbors agree to preserve their common resources. "Before I built a wall I'd ask to know/What I was walling in or walling out,/And to whom I was like to give offense." But the neighbor's motives remain inscrutable.

I see him there,
Bringing a stone grasped firmly by the top
In each hand, like an old-stone savage armed.
He moves in darkness as it seems to me
Not of woods only and the shade of trees.

The neighbor simply will not admit that letting the wall deteriorate is a possibility and says again to conclude the poem, "Good fences make good neighbors."

And so the reader is left with a puzzle. The wall has different meanings to each of the neighbors and, although the narrator calls his neighbor each year to the task, he himself finds many reasons to question the merits of preserving the wall. So what moves these two people to come together each year to mend this common resource? Could it be that what makes good neighbors is not simply a boundary? Could it be that what makes good neighbors is the very act of keeping the common resource good—of making and taking the time together to preserve and mend it?

The Archiving of Digital Information

The library, publisher, and scholarly communities are now engaged in efforts to resolve the problems associated with preserving another kind of common resource: digital information. Such information is a critical priority, especially for libraries and other institutions that have borne responsibility for maintaining the cultural record. Six years have now passed since the Task Force on Archiving of Digital Information issued its report (Waters and Garrett 1996). During the course of its work from 1994-1996, the Task Force recognized well that "something there is that doesn't love digital information." In the face of the limits of digital technology, the Task Force struggled, as does Frost's narrator, with the question of motivation and action: Why should we preserve digital information, and who should do it?3

The Task Force's response was that we need a serious investment in archiving because we are in danger of losing our cultural memory. The first line of defense rests with creators, providers, and owners, who must take responsibility for creating archivable content. A deep infrastructure, consisting of trusted organizations capable of storing, migrating, and providing access to digital collections, is then needed for the long term. A process of certification to establish a climate of trust is also needed, as is a fail-safe mechanism by which certified archives would have a right and duty to exercise aggressive rescue of endangered or "orphaned" materials.

Since the Task Force report was issued, there has been much experimentation, definition of requirements, and development, much of it reported and summarized in previous papers in this volume.4 Margaret Hedstrom has reported the recent emergence of a greatly sharpened sense of the research needed to support digital archives. The development of the Reference Model for Open Archival Information Systems (OAIS) has been a galvanizing force (CCSDS 2001). As Titia van der Werf, Colin Webb, and others have described, a number of digital archives have been created, are being created, or are expanding following the OAIS model in the United States, the United Kingdom, the European Union, and Australia.5 Most of these efforts, however, have been government-funded, a point to which I return below.

In other developments, the emulation-versus-migration debate has largely played itself out. Neither approach provides a sufficient, general answer to the problem of digital preservation, and it has proven largely fruitless to debate the merits of these approaches in the abstract.6 Instead, there is growing recognition that different kinds of information captured in different ways for long-term preservation will need various kinds of support.

Thanks to a variety of reports, such as those organized by the Research Libraries Group (RLG) and the Online Library Computer Center (OCLC) on preservation metadata for digital objects and attributes of trusted digital repositories, there is a deepening understanding of the requirements and expectations for best practices when building trustworthy archives.7 Some of these analyses of requirements, it must be noted, are also being conducted in the abstract, without a realistic sense of costs and what will work, and so may be setting unrealistic expectations. Nevertheless, much is being learned from all these initiatives.

Our vision is much less clear about the infrastructure needed to enable archives to cooperate and interoperate. Our understanding of the legal and business frameworks needed to sustain the long-term preservation of digital information is likewise still very crude.8 For those interested in these questions, a recent initiative of the Mellon Foundation that was designed to explore the archiving of electronic journals may shed some light. This paper describes some of the results of that project, lays out some of the issues the participants have encountered, and suggests some solutions.

Mellon Electronic Journal Archiving Program

Over the last decade, there has been much hope placed in the potential of electronic publishing as a means of resolving the rising costs of scholarly publishing.9 However, with the recent dot.com collapse has come an increasingly sober approach to electronic publishing. One aspect of the reassessment that is under way is a growing awareness that archiving has not yet been factored into the overall costs of the system, and if electronic publishing is to be taken seriously, it must be.

Given the general digital archiving problem, and the Foundation's particular concern with scholarly publishing, Foundation staff began several years ago consulting with librarians, publishers, and scholars about how best to stimulate investments in solutions. An investment in the archiving of electronic journals seemed to be especially promising and was welcomed by both publishers and libraries. The Foundation solicited proposals for one-year planning projects, and, in December 2000, the trustees selected seven for funding.10 Building on what has been learned during these planning efforts, the Foundation is now preparing to fund two major implementation projects.

What was the reason for focusing on e-journals? Scholars demand the multiple advantages of this emerging medium, including reference linking, easy searching across issues and titles, and the ability to include data sets, simulation, multimedia, and interactive components in the published articles. In addition to flexibility and functionality, e-journals have promised lower costs, but this goal has remained elusive. Major journals are rarely published only in e-format, and the costs of archiving are unknown. Without trusted electronic archives, it is unlikely that e-journals can substitute for print and serve as the copy of record, and so we have a duplicative and even more costly system—a system we all hope is transitional.11

Of the seven Foundation-funded planning projects, the Stanford University project proposed to develop a technology for harvesting presentation files—the Web-based materials that publishers use to present journal content to readers—and storing them in a highly distributed system called LOCKSS, (Lots Of Copies Keeps Stuff Safe). Five projects engaged in planning for the capture of publishers' source files, including high quality images and text that is encoded in the standard generalized markup language (SGML) or the extensible markup language (XML).12 Three of these explored a publisher-based approach: Harvard worked with Wiley, Blackwell, and the University of Chicago Press; the University of Pennsylvania worked with the Oxford and Cambridge University presses; and Yale partnered with Elsevier. The two other projects took a discipline-based approach: Cornell focused on journals in agriculture, and the New York Public Library focused on e-journals in the performing arts. In the seventh project, the Massachusetts Institute of Technology explored the issues involved in archiving what it saw as a new class of periodical publication made possible by the digital medium—publications that it referred to as "dynamic e-journals." These publications included CogNET and Columbia International Affairs Online (CIAO).13

When inviting proposals for these projects, the Foundation asked applicants to focus on a rather complicated set of objectives. They were asked to:

  • identify publishers with which to work and to begin to develop specific agreements regarding archival rights and responsibilities
  • specify the technical architecture for the archive, perhaps using a prototype system
  • formulate an acquisitions and growth plan
  • articulate access policies
  • develop methodologies to be used to validate and certify the repository as a trusted archive
  • design an organizational model, including staffing requirements and the long-term funding options, that could be tested and evaluated during a setup phase

 

These were ambitious goals, and the outcomes that the Foundation trustees expected were equally ambitious. They hoped that leading research institutions, in partnership with specific publishers,

would develop and share detailed understandings of the requirements for setting up and implementing trustworthy archives for electronic journals; that enabling technology would be developed to facilitate the archiving process; and that plans would be developed as competitive proposals designed to secure funding for the implementation and operation of electronic journal archives.

The planning period has come to an end, and much has been accomplished. In this paper, I cannot analyze how each of the projects succeeded or failed in meeting the ambitious goals and expectations set for them.14 Instead, I would summarize the findings by noting, first, that archiving now seems technically feasible using different approaches: the capture of Web-based presentation files using LOCKSS and the capture of source files. Second, participating publishers have come to view archiving their journals as a competitive advantage. Third, there is an increasingly shared understanding that an e-journal archive should aim to make it possible to regard e-journals as publications of record and to persuade publishers and libraries to consider abandoning print. There were other key results, some of them unexpected. I now turn to a discussion of the most important of these, which relate to the economics and organization of digital preservation.

The Political Economy of Public Goods

In trying to devise next steps, the project teams ran smack into some of the classic problems of the political economy of public goods—questions that Robert Frost explored in a much more elegant and artful way. What are the incentives for individuals and institutions to participate in the provision of a good from which others cannot be readily excluded from enjoying the benefit? What are the organizational options? What are sustainable funding plans?

The Task Force on Archiving of Digital Information argued that the value of digital information rests in what it contributes to our cultural memory. Because cultural memory is a public good, it follows that insuring against the possible loss of such memory by the archiving of digital information would also be a public good. The joint economic interest of publishers, authors, and the scholarly community in electronic journals as intellectual property is reason to suggest that archiving them may not be a public good in the strictest sense of the term. Still, the archiving of digital information has special properties as a kind of modified public good that demands special attention.15

To understand these properties, let us examine the proposition that archiving is insurance against the loss of information. Is archiving really like insurance, in the sense of life or fire insurance? Would a business model for archiving based on an insurance model induce people to take on responsibility for archiving? If you have fire insurance and your house burns down, you are protected. If you have life insurance and you die, your heirs benefit. There is an economy in these kinds of insurance that induces you to buy. If you fail to buy, you are simply out of luck; you are excluded from the benefits. Unfortunately, the insurance model for archiving is imperfect, because insurance against the loss of information does not enforce the exclusion principle.16

A special property of archiving is that if one invests in preserving a body of information and that information is eventually lost to others who did not take out the insurance policy, the others are not excluded from the benefits, because the information still survives. Because free riding is so easy, there is little economic incentive to take on the problem of digital preservation, and this partly explains why there has been so little archive building other than that funded by governments. Potential investors conclude that "it would be better for me if someone else paid to solve the archiving problem." In fact, one of the defining features of a public good—and think here of other public goods such as parks or a national defense system—is that it is difficult and costly to exclude beneficiaries.

The Tragedy of the Commons

Given the huge free-riding problem associated with the maintenance of public goods, what are the alternatives? Reflecting in part on this problem, Garrett Hardin in an influential article entitled "The Tragedy of the Commons," despaired of solutions. "Ruin," he wrote, "is the destination toward which all men rush, each pursuing his own interest in a society that believes in the freedom of the commons. Freedom in a commons brings ruin to all" (1968, 1244). Hardin echoed Thomas Hobbes, who lamented the state of nature, a commons in which people pursue their own self-interest and lead lives that are "solitary, poore, nasty, brutish, and short" ([1651] 1934, 65). Remember the state-of-nature allusion in the Frost parable about preserving a common resource? To the narrator, the neighbor seems "like an old-stone savage armed."

Focused on preserving digital information in 1996, the Task Force on Digital Archiving echoed both Hobbes and Hardin in writing that "rapid changes in the means of recording information, in formats for storage, in operating systems, and in application technologies threaten to make the life of information in the digital age 'nasty, brutish, and short'"(Waters and Garrett 1996, 2). One of Hardin's solutions to the tragedy of the commons was, like Hobbes's, to rely on the leviathan—the coercive power of the government. Certainly, protection of the common good in the archiving of digital information could be achieved by massive government support, perhaps in combination with philanthropy.

Given these considerations of public goods economics, it is no accident that so many of the existing archiving projects are government funded, and it may be that some forms of archiving can be achieved only through a business model that is wholly dependent on government or philanthropic support. Several national governments, including our own through the agency of the Library of Congress, are exploring the power of copyright deposit and other mechanisms for developing digital archives. The National Archives and Records Administration is financing major archiving research projects with the San Diego Supercomputer Center and other organizations. Brewster Kahle's Internet Archive, which has been collecting and storing periodic snapshots of the publicly accessible Web, is an extraordinary example of philanthropic investment in digital archiving by someone who made his fortune in the development of supercomputers.17

Hardin's other solution to the tragedy of the commons was to encourage its privatization, trusting in the power of the market to optimize behavior and preserve the public good. It is not unreasonable to view congressional extensions of copyright and other measures to protect the rights of owners as efforts to privatize intellectual property and entrust its preservation to the self-interest of owners.18 Advocates of author self-archiving articulate a similar trust of self-interest in the service of the public good.19 Moreover, in the digital realm, as with other forms of information, the passions and interests of what Edward Tenner has called "freelance selectors and preservers" will almost surely result in valuable collections of record (2002, 66). Just as government and philanthropy undoubtedly have a role in digital archiving, so too will private self-interest. In fact, the Task Force report suggested that the first (but not last) line of defense in digital archiving rests with creators, providers, and owners.

Organizational Options

Government control and private interest, however, are unlikely to be sufficient, or even appropriate in many cases, for preserving the public good in digital archiving. Moreover, substantial experimental and field research in the political economy of public goods has shown Hardin's pessimism about the prospects of maintaining public goods to be unwarranted. Case after case compiled since Hardin published in 1968 demonstrates that groups of people with a common interest in a shared resource will draw on trust, reciprocity, and reputation to devise and agree upon rules for and the means of financing the preservation of the resource.20 The projects that Mellon funded provide seven more case studies with similar prospects for e-journal archiving.

The Mellon Foundation will undoubtedly continue to pursue its long-standing philanthropic interest in the preservation of the cultural record as a condition of excellence in higher education. At the same time, it is looking, as it does in nearly all cases of support, for ways to promote a self-sustaining, businesslike activity. It seeks to foster the development of communities of mutual interest around archiving, help legitimize archiving solutions reached within these communities, and otherwise stimulate and facilitate community-based archiving solutions. The premise of the Mellon e-journal projects was that concern about the lack of solutions can be addressed only by hard-nosed discussions among stakeholders about what kinds of division of labor and rights allocations are practical, economical, and trustworthy.

What about publisher-based archives? The question here is not whether preservation is in the mission of publishers. As long as their databases are commercially viable, publishers have a strong interest in preserving the content—either themselves or through a third party. Scholarly publishers also have an incentive to contribute in the interests of their authors, who want their works to endure, be cited, and serve as building blocks for knowledge. However, the concern about the viability of publisher-based archives is whether the material is in a preservable format and can endure outside the cocoon of the publisher's proprietary system. One necessary ingredient in a proof of archivability is the transfer of data out of their native home into an external archive, and as long as publishers refuse to make such transfers, this proof cannot be made.

The research libraries of major universities are also interested, some say by definition, in ensuring that published materials are maintained over the long term. With regard to the digital archiving of electronic journals, the libraries in the Mellon projects have generated several significant technical and organizational breakthroughs. They demonstrated that digital archiving solutions that meet the needs of the scholarly community require at least three factors: extreme sensitivity to public goods economics, dramatic efforts to take advantage of the economies of scale inherent in the technology either through centralization or a radical distribution of service, and very low coordination costs in consistently and transparently managing publisher and user relations. Meeting these requirements within existing library structures has proved elusive, but in mapping out what these requirements are, some of the most imaginative minds working in libraries today have blazed trails in the Mellon-sponsored projects and demonstrated what solutions are likely to succeed in the next phase of the e-journal initiative.

What Would Be the Economic Model?

One of the surprising findings that the Mellon Foundation has made in monitoring these projects is that new organizations are likely going to be necessary to act in the broad interest of the scholarly community and to mediate the interests of libraries and publishers. But if some new archival organization (or organizations) were created to perform the preservation function, what rights and privileges would they need to be able to sustain the e-journal content? Can ways be found to apply the exclusion principle in such a manner that it creates an economy for digital archiving—a scarcity that publishers and libraries are willing to pay to overcome and that would support the larger public good? Put another way, what kinds of exclusive benefits can be defined to induce parties to act in the public good and invest in digital archiving?

Access is the key. Over and over again, we have found that one special privilege that would likely induce investment in digital archiving would be for the archive to bundle specific and limited forms of access with its larger and primary responsibility for preservation. User access in some form is needed in any case for an archive to certify that its content is viable. But extended and complicated forms of access not only add to the costs of archiving, they also make publishers very nervous that the archives will in effect compete for their core business. As a result, the Foundation is now looking to support models of archival access that serve the public good but that do not threaten the publishers' business.

Secondary, noncompeting uses might include aggregating a broad range of journals in the archive—a number of publications larger than any single publisher could amass—for data mining and reflecting the search results to individual publishers' sites. Another kind of limited, secondary use might be based on direct access to the content with "moving walls" of the kind pioneered in JSTOR.21 Much work needs to be done to sort out what the right access model might be, but it is clear that so-called "dark" archives, in which a publisher can claim the benefit of preservation but yields no rights of access, do not serve the public good. They serve only the publisher, and the Foundation is not willing to support such archives.

Archiving requires agreements. The basic value proposition for digital archiving that has thus emerged from these projects is this: Publishers would bear the costs of transferring their content in an archivable form to a trusted archive and allow a limited but significant form of access or secondary use as part of the archiving process. Universities and colleges, through their libraries, would pay for the costs of preservation in exchange for a specific but limited form of access; those who do not contribute do not get the access. Given this form of participation by publishers and universities, e-journal archives would maintain the content over time. This bargain would have to be cemented organizationally and legally in the form of appropriate licenses that define in detail what content is archived, the responsibilities of the parties, and the conditions of use.

Priming the Pump

To prime the pump for such self-sustaining, community-based solutions for the archiving of scholarly electronic journals, the Foundation is now focused on developing support for the two approaches explored in the planning process just concluded, namely, preserving presentation files using LOCKSS and preserving source files.

Preserving presentation files with LOCKSS. In the LOCKSS system, a low-cost Web crawler is used for systematically capturing presentation files. Publishers allow the files to be copied and stored in Web caches that are widely distributed but highly protected. The caches communicate with each other through a secure protocol, checking each other to see whether files are damaged or lost and repairing any damage that occurs. Caching institutions have the right to display requested files to those who are licensed to access them if the publisher's site is unavailable and to provide the local licensed community the ability to search the aggregated files collected in the institutional cache.

During the next phase of development, the key issues for the LOCKSS system are to separate the underlying technology from its application as an e-journal archiving tool; explore ways of ensuring the completeness and quality of e-journal content on acquisition and of managing the content as bibliographic entities rather than simply as Web-addressed files; expand the coverage of journals; maintain the LOCKSS software; and identify strategies for migrating the e-journal content. To help undertake and finance these tasks, Stanford has identified a variety of partners and is planning the development of a LOCKSS consortium.

Preserving source files. The source file capture approach requires that publishers be able to present, or "push," files in a normalized form to the e-journal archive. The question of cost in this approach turns, at least initially, on how many output formats a publisher must produce and how many an archive must support from different publishers. During the course of its project, Harvard commissioned a consultant's report to determine the feasibility of developing a standard archival interchange document type definition (DTD) that would dramatically reduce this complexity (Inera 2001). The report suggests that it is possible to produce such a DTD without reducing content to the lowest common denominator, sacrificing substantial functionality and appearance, or avoiding attention to extended character sets, mathematical symbols, tables, and other features of online documents. The planning projects also made significant progress in specifying both the tools needed to transfer, or "ingest," e-journal content into the archive and a workflow for managing content quality control. License agreements were also outlined that began to converge on the concepts of "moving walls" and other limited rights of user access.

What are the next steps for developing the source file capture approach? The cost and scale of archiving source files suggest the need for a coordinated and collaborative approach for shaping the agreements with publishers, developing the underlying archival repository, and creating operational procedures for transferring content from publisher to archive. One approach that the Foundation is considering would be to channel the expertise and energy developed in these projects through a not-for-profit entity that is either part of JSTOR or related to it. Such an entity would be expected to assume archiving responsibility for a substantial subset of the electronic journal literature for the academic community and would require investment by the university community to obtain the benefits of secondary access rights that the archive would provide and that would not compete with the core business of the publishers. This is not to say that the business model and terms of participation that currently exist at JSTOR are a perfect fit for electronic archiving, but rather that a lean, entrepreneurial, mission-driven organization such as JSTOR, which is positioned at the nexus of publishers, libraries, and scholars, is well situated to take the development of the archive to the next step.22 As the new organization begins to take shape, the Foundation expects to involve the participants from the planning projects, to incorporate the specific breakthroughs each participant has made, and to think about the specific models of access and cost recovery that would be necessary to preserve and sustain electronic journal content for the common good of the scholarly community.

These two approaches are very different. Although experience might later tell us that one approach is better suited than the other for certain kinds of materials, it would not be useful now to think of them as competing approaches. We have to get used to the idea that overlapping and redundant archiving solutions under the control of different organizations with different interests and motives in collecting offer the best hope for preserving digital materials. We currently have no operating archives for electronic journals. It would be unwise at the outset to expect that only one approach would be sufficient.

Moreover, these different approaches suggest a natural layering of functions and interfaces from the repository layer to access services. Given such points of interaction, specialization and division of labor are possible that could result in real economies. If there are economies of scale in the LOCKSS system, for example, some functions could be more centralized in what was conceived as a highly decentralized system. Conversely, source file capture could make greater use of distributed storage. Possibilities exist for even further development. Files aggregated in the archives across publishers could serve secondary abstract and indexing publishers as a single source, not only saving them from going to each and every publisher for the texts to index but also enabling them to use computational linguistic and other modern techniques to improve their products. Source files might also be "born archival" at the publisher and deposited in the archive, from which they might then serve as the masters for the derivative published files that the publisher creates for its different markets. These latter possibilities are not likely to emerge immediately, mainly because they would require intense negotiation among the interested parties; however, they are suggestive of how a thoughtful, entrepreneurial, community-based approach to archiving might add incremental improvements that would actually lead to more dramatic transformations of the system of scholarly communications.

Broader Context and Conclusions

The approaches to e-journal archiving that the Foundation and its partners are now considering would have to be formulated in the context of a much broader array of solutions for the archiving of digital information. An especially important part of this larger context is the development of local institutional archives for the variety of scholarly digital materials that members of each college or university community create but have little means of maintaining over time. The basis for what appears in scholarly journals will undoubtedly be found in data sets and other supporting materials at the authors' home institutions. In addition, a range of archival solutions needs to be developed for the much broader array of digital content in the form of newspapers, popular periodicals, music, video, scientific data sets, and other digital content that the cultural and scholarly community deems important for long-term preservation.

Another element in the larger context, and a critical impediment for digital archiving that arises again and again, is the legal regime governing intellectual property. There is now considerable confusion among policy makers in the United States about how the protections that have been afforded to owners of intellectual property in the digital age should serve to advance the higher goal established in the U.S. Constitution of promoting "the progress of science and useful arts."23 For print materials, special exemptions have been built into the copyright law for preservation activities.24 It may be too early to formulate specific exemptions that would apply to digital information. However, instead of waiting indefinitely for the policy confusion to be resolved, one step forward may be to begin to articulate "safe harbor" principles about intellectual property rights that could form the basis of digital archiving agreements among interested parties. In building JSTOR and ArtSTOR, the Foundation has found that content owners are much more comfortable with agreements that limit uses of intellectual property to not-for-profit educational purposes than they are with agreements that leave open the possibility of creating competing commercial profit-making access to the property. Lawrence Lessig has also recently argued for the utility of the distinction between not-for-profit educational uses and other kinds of uses of intellectual property (2001, 249-261). Because educational use is certainly consistent with the Constitutional mandate for intellectual property law in the United States to promote "the progress of science and useful arts," perhaps it is time to build a safe-harbor framework for digital archiving on just such a distinction.

It is on this point that we come back to Robert Frost's preservation parable. I suggested earlier that what makes good neighbors may not be simply a boundary. Rather what makes good neighbors is the very act of keeping good the common resource between them—the act of making and taking the time together to preserve and mend the resource. So too it is with digital archiving.

In the context of an array of factors relating to many kinds of digital materials, the lessons of the Mellon planning projects are clear. Relevant stakeholders—scholars, publishers, and research libraries—can frame the archiving problem very concretely as a problem of technical, organizational, and economic development. Two options are being actively explored as a result. The first, LOCKSS, appears to be a relatively inexpensive solution, but caution is needed because the system may not be capturing files in the best long-term format. The second option, source file capture, is likely to be more expensive but promises to support the most durable archive. Framed in this way, using a variety of approaches, digital archiving, for electronic journals at least, seems achievable as what one might call a modified public good.

There are many dimensions to the good to be achieved, but two merit special mentioning. On the one hand, there is the joining together by scholars and the agents of education—universities, libraries, scholarly societies, and publishers—in serving the common interest of future scholarship by keeping good, or preserving, the digital resources now being created. On the other hand, there is the research and learning thereby made possible, which are the indelible marks of a good scholar. In other words, good archives make good scholars. If we accept the proposition that a free society depends on an educated citizenry, it is not a great leap of logic to conclude further that good archives make good citizens.


FOOTNOTES

1 For the poem, see Lathem (1979: 34-35).

2 For critical commentaries on the poem, see Nelson (2001) and Faggen (2001).

3 The word "archiving" has multiple senses ranging from the narrow sense used by professional archivists to designate the process of preserving formal records to the broad sense used by computer technologists to refer to a temporary backup collection of computer files. For certain purposes and audiences, one might choose to restrict use of the word to one or other of these senses. In this paper, I have followed the Task Force on Archiving of Digital Information (Waters and Garrett 1996), and use digital archiving and digital preservation interchangeably to refer to the long-term maintenance of digital objects judged to be of enduring value.

4 The Preserving Access to Digital Information (PADI) Web site, which is maintained by the National Library of Australia, is one of the most comprehensive and up-to-date sources of information about the archiving of digital information. Available at: http://www.nla.gov.au/padi/.

5 For a recent overview, see also Hodge and Carroll (1999).

6 See, for example, the largely polemical debate on the relative merits of emulation and migration in Rothenberg (1999) and Bearman (1999). For a more balanced view, see Granger (2000).

7 See OCLC (2002) and RLG (2002). For a different approach to requirements definition, see Cooper, Crespo, and Garcia-Molina (2000).

8 For approaches to these topics, see, for example, Granger (2002), and Cooper and Garcia-Molina (2001).

9 See, for example, Ekman and Quandt (1999).

10 Copies of the successful proposals are available at http://www.diglib.org/preserve/ejp.htm. See also Flecker (2001). For another perspective on the archiving of electronic journals, see Arms (1999).

11 For data on these issues, see, for example, Born and Van Orsdel (2001), and Van Orsdel and Born (2002).

12 In what follows, I distinguish two approaches to archiving: one that focuses on the capture of presentation files; the other that focuses on source file capture. Dale Flecker of Harvard University points out, in a personal communication dated May 30, 2002, that for many publishers, SGML or XML files are not really source files, but are among a variety of derivative files that are generated during the publication process. Referring to a "source file approach" to electronic journal archiving thus may be inaccurate from at least one perspective. I have nevertheless retained the label because the intent of this group of planning projects was to identify and capture files that would serve both publishers and archives as an authoritative source from which e-journal content could be reliably disseminated to a reader as the technology for representation and display changed over time.

13 See http://www.cognet.org/ and http://www.ciaonet.org/.

14 Each of the institutions that participated in the Mellon Electronic Journal Archiving Initiative is preparing a final report of its planning project. All reports should be available by September 2002 at http://www.diglib.org/preserve/ejp.htm.

15 For a strict definition of a public good, see Baden (1998: 52): "A public good is one which, if available for anyone, is available for everyone. . . . This suggests that the good is not easily packaged for sale, and people cannot be excluded from its consumption. In other words, property rights cannot be readily established for public goods. A public good is also one whose incremental use does not reduce, subtract, or consume it."

16 There is a substantial literature on the economics of various types of insurance, which is broadly defined as a mechanism that "mitigates against the influence of uncertainty" (McCall 1987: 868). For analyses of the problems in creating markets for insurance, see, for example, Arrow (1963), Pauly (1968), Ehrlich and Becker (1972), and Hirshleifer and Riley (1979).

There may be great utility in viewing digital preservation in terms of the business of insurance with its apparatus of risk management and underwriting. Some preliminary and promising applications of the economics of insurance to the problems of digital archiving include Lawrence (1999) and Kenney (2002).

17 See http://www.archive.org/.

18 Whether such extensions are good public policy is the subject of vigorous debate. See, for example, Lessig (2001) and Vaidhyanathan (2001).

19 See, for example, Harnad (2001).

20 See, for example, Ostrom (1990), Bromley (1993), Anderson and Simmons (1993), Baden and Noonan (1998), and Ostrom (1999).

21 See http://www.jstor.org/about/movingwall.html.

22 JSTOR has developed significant expertise in the archiving of electronic journals that could be greatly leveraged. See Guthrie (2001). For a further account of JSTOR's archiving activities, see the presentation by Eileen Fenton at http://www.jstor.org/about/e.archive.ppt.

23 U.S. Constitution, Article 1, Section 8, Clause 8.

24 U.S. Code, Title 17, Section 108.


REFERENCES

All URLs were valid as of July 10, 2002.

Anderson, Terry L., and Randy Simmons, eds. 1993. The Political Economy of Customs and Culture: Informal Solutions to the Commons Problem. Lanham, Md.: Rowman and Littlefield Publishers, Inc.

Arms, William. 1999. Preservation of Scientific Serials: Three Current Examples. Journal of Electronic Publishing 5 (December 1999). Available at: http://www.press.umich.edu/jep/05-02/arms.html.

Arrow, Kenneth J. 1963. Uncertainty and the Welfare Economics of Medical Care. American Economic Review 53 (December): 941-973.

Baden, John A. 1998. A New Primer for the Management of Common-Pool Resources and Public Goods. In Baden and Noonan (1998): 51-62.

Baden, John A., and Douglas S. Noonan, eds. 1998. Managing the Commons, 2nd ed. Bloomington: Indiana University Press.

Bearman, David. 1999. Reality and Chimeras in the Preservation of Electronic Records. D-Lib Magazine 5(4) (April). Available at: http://www.dlib.org/dlib/april99/bearman/04bearman.html.

Born, Kathleen and Lee Van Orsdel. 2001. Searching for Serials Utopia: Periodical Price Survey 2001. Library Journal (April 15): 53-58.

Bromley, Daniel, ed. 1993. Making the Commons Work: Theory, Practice, and Policy. San Francisco: ICS Press.

CCSDS, 2001. Reference Model for an Open Archival Information System (OAIS), Draft Recommendation for Space Data System Standards, CCSDS 650.0-R-2. Red Book. Issue 2. Washington, D.C.: Consultative Committee for Space Data Systems. July. Available at: http://www.ccsds.org/RP9905/RP9905.html.

Cooper, Brian, Arturo Crespo, and Hector Garcia-Molina. 2000. Implementing a Reliable Digital Object Archive. Proceedings of the Fourth European Conference on Research and Advanced Technology for Digital Libraries (ECDL). Lisbon, Portugal, September 18-20. Available at: http://dbpubs.stanford.edu:8090/pub/2000-28.

Cooper, Brian, and Hector Garcia-Molina. 2001. Creating Trading Networks of Digital Archives. Delivered at the First ACM/IEEE Joint Conference on Digital Libraries. June 24-28. Roanoke, Virginia. Available at: http://dbpubs.stanford.edu:8090/pub/2001-23.

Ehrlich, Isacc, and Gary S. Becker.1972. Market Insurance, Self-Insurance, and Self-Protection. Journal of Political Economy 80 (July-August): 623-648.

Ekman, Richard, and Richard Quandt, eds. 1999. Technology and Scholarly Communication. Berkeley: University of California Press.

Faggen, Robert, ed. 2001. The Cambridge Companion to Robert Frost. New York: Cambridge University Press.

Flecker, Dale. 2001. Preserving Scholarly E-Journals. D-Lib Magazine 7(9) (September). Available at: http://www.dlib.org/dlib/september01/flecker/09flecker.html.

Granger, Stewart. 2000. Emulation as a Digital Preservation Strategy. D-Lib Magazine 6(10) (October). Available at: http://www.dlib.org/dlib/october00/granger/10granger.html.

Granger, Stewart. 2002. Digital Preservation and Deep Infrastructure. D-Lib Magazine 8(2) (February). Available at: http://www.dlib.org/dlib/february02/granger/02granger.html.

Guthrie, Kevin. 2001. Archiving in the Digital Age: There's a Will but is There a Way? EDUCAUSE Review (December): 56-65. Available at: http://www.educause.edu/ir/library/pdf/erm0164.pdf.

Hardin, Garrett. 1968. The Tragedy of the Commons. Science, 162 (December 13): 1244.

Harnad, Steven. 2001. The Self-archiving Initiative: Freeing the Refereed Research Literature Online. Nature 410 (April 26), 1024-1025. Available at: http://www.ecs.soton.ac.uk/~harnad/Tp/nature4.htm.

Hirshleifer, J., and John G. Riley. 1979. The Analytics of Uncertainty and Information—An Expository Survey. Journal of Economic Literature 17 (December): 1375-1421.

Hobbes, Thomas. [1651] 1934. Leviathan. Reprint. London: J. M. Dent & Sons, Ltd.

Hodge, Gail, and Bonnie C. Carroll. 1999. Digital Electronic Archiving: The State of the Art and the State of the Practice. A Report Sponsored by the International Council for Scientific and Technical Information, Information Policy Committee, and CENDI. Available at: http://www.dtic.mil/cendi/proj_dig_elec_arch.html.

Inera, Inc. 2001. E-Journal Archive DTD Feasibility Study. Prepared for the Harvard University Library, Office of Information Systems, E-Journal Archiving Project. Available at: http://www.diglib.org/preserve/hadtdfs.pdf.

Kenney, Anne, et al. 2002. Preservation Risk Management for Web Resources, D-Lib Magazine 8(1) (January). Available at: http://www.dlib.org/dlib/january02/kenney/01kenney.html.

Lathem, Edward Connery, ed. 1979. The Poetry of Robert Frost: The Collected Poems, Complete and Unabridged. New York: Henry Holt and Company.

Lawrence, Gregory, et al. 1999. Risk Management of Digital Information: A File Format Investigation. Washington, D.C.: Council on Library and Information Resources. Available at: http://www.clir.org/pubs/reports/pub93/contents.html.

Lessig, Lawrence. 2001. The Future of Ideas: The Fate of the Commons in a Connected World. New York: Random House

McCall, J. J. 1987. Insurance. In The New Palgrave: A Dictionary of Economics. Volume 2. Edited by John Eatwell, Murray Milgate, and Peter Newman. London: Macmillan. Pp. 868-870.

Nelson, Cary, ed. 2001. On Mending Wall. Modern American Poetry: An Online Journal and Multimedia Companion to Anthology of Modern American Poetry. Urbana-Champaign: University of Illinois, Department of English. Available at: http://www.english.uiuc.edu/maps/poets/a_f/frost/wall.htm.

OCLC. 2002. A Metadata Framework to Support the Preservation of Digital Objects: A Report by the OCLC/RLG Working Group on Preservation Metadata. (June). Dublin, Ohio: Online Library Computer Center, Inc. Available at: http://www.oclc.org/research/pmwg/pm_framework.pdf.

Ostrom, Elinor. 1990. Governing the Commons: The Evolution of Institutions for Collective Action. Cambridge: Cambridge University Press.

Ostrom, Elinor, et al. 1999. Revisiting the commons: local lessons, global challenges, Science 284 (April 9): 278-282.

Pauly, Mark V. 1968. The Economics of Moral Hazard: Comment. American Economic Review 58 (June): 531-537.

RLG. 2002. Trusted Digital Repositories: Attributes and Responsibilities. An RLG-OCLC Report. Mountain View, Calif.: Research Libraries Group. Available at: http://www.rlg.org/longterm/repositories.pdf.

Rothenberg, Jeff. 1999. Avoiding Technological Quicksand: Finding a Viable Technical Foundation for Digital Preservation. Washington, D.C.: Council on Library and Information Resources. Available at: http://www.clir.org/pubs/reports/rothenberg/contents.html.

Tenner, Edward. 2002. Taking Bytes from Oblivion. U.S. News & World Report 132 (April 1): 66-67.

Vaidhyanathan, Siva. 2001. Copyrights and Copywrongs: The Rise of Intellectual Property and How it Threatens Creativity. New York: New York University Press.

Van Orsdel, Lee and Kathleen Born. 2002. Doing the Digital Flip: Periodical Price Survey 2002. Library Journal (April 15): 51-56.

Waters, D., and J. Garrett, eds. 1996. Preserving Digital Information: Report of the Task Force on Archiving of Digital Information. Washington, D.C. and Mountain View, Calif.: The Commission on Preservation and Access and the Research Libraries Group. Also available at: http://www.rlg.org/ArchTF/.


previous section >> | report contents >>

pub 107 abstract >>