Libraries have been digitizing collections for a decade or more. Their collective experience has produced a depth of technical expertise and a set of tested practices. That information is widely shared among digital library staffs and has been well reported at meetings and in publications. This ongoing experiment with representing research collections online has resulted in the codification of technical practices and the emergence of clear trends in selection policies. This paper reviews existing selection practices in libraries, identifies selection policies and best practices where they exist, and discusses the long-term implications of the opportunities and constraints that shape digital-conversion programs. This is not a systematic review of what all research libraries are doing, but an analysis of significant achievements that will make it possible to identify good practices and benchmarks for success. Every library, regardless of size or mission, will need to determine for itself how and when digitization will move from being an experiment to becoming a collection-development strategy that is well integrated into its daily practice.

For purposes of analysis, this study looks primarily at a subset of “first-generation” digital libraries, that is, those that have been engaged in significant digitization projects for a while. However, the study also looks at a few libraries that are just beginning to develop digitization programs to see what approaches they have taken, in light of others’ experience. Research was conducted by studying the Web sites of all Digital Library Federation (DLF) members as well as the sites of other libraries and research institutions engaged in putting collections online. More important for analytical purposes were the site visits made to selected libraries-the University of Michigan, Cornell University, the University of Virginia (UVA), The New York Public Library (NYPL), New York University (NYU), and the New-York Historical Society. In addition to the fact that some of the selected institutions are first-generation digital libraries and others are not, there are great differences in governance and funding among the libraries surveyed. Some are in public universities, some are in private institutions, and some are independent of an academic institution. These differences are reflected in the various approaches they take to selecting what to digitize, how to do so, and for whom.

Each library was given a set of questions about selection criteria that constituted the framework for investigation, and each institution organized its responses individually. (The Library of Congress [LC], also included in this study, answered the questions in writing, and no site visit was made.) The questions begin with the selection process and proceed through the creation of metadata, decisions about access policies, and user support systems.

1.1. Defining a Sustainable Strategy

While the great majority of research libraries have undertaken digitization projects of one type or another, only a few are developing full-scale digitization programs rather than focusing on discrete uses of digitization for specific purposes. How do the libraries that have undertaken full-scale efforts conceptualize the role of digitized collections in providing collections and services to their core constituencies? What are they doing, or what have they determined must be done, to move from project-based conversion to programs that, whether large or small, have a well-defined role in the long-term goals of the library?

This report works from the assumption that to be sustainable, a digitization program should have certain intrinsic features. It should

  • be integrated into the fabric of library services;
  • be focused primarily on achieving mission-related objectives;
  • be funded from predictable streams of allocation, be they external or internal; and
  • include a plan for the long-term maintenance of its assets.

A sustainable digitization program, in other words, would be fully integrated into a library’s traditional collection-development strategies. A digitization program need not be large and production oriented to be sustainable. The role of conversion can be significant and well thought out, even when the conversion program serves limited purposes and has limited resources.

Any assessment of what libraries have achieved so far must take into account two key factors common to sustainable collection development, be it of analog, digitized, or born-digital materials. These factors are

  • a strategic view of the role of collections in the service of research and teaching or other core institutional missions, and
  • life cycle planning for the collections, beginning with their identification and including acquisition, cataloging and preservation, and providing reference.

A strategic view can be revealed in many cases not only by looking at how closely the results serve the mission but also at the decision-making process itself-that is, who decides what to convert to serve which ends. When are the decisions made primarily by subject specialists based on existing collection strengths, and when is the selection process shaped by curricular development and other faculty needs? If the latter, then by what process are the faculty involved and how are teaching and research tools developed to meet their needs?

Ensuring long-term access to digital collections depends on careful life-cycle management. How does the library budget for not only the creation of the digital scans but also for the metadata, storage capacity, preservation tools (e.g., refreshing, migration), and user support-the sorts of things that are routinely budgeted for book acquisitions? How much of the program is supported by grant funding and how much by base funding? If the program is currently grant supported, what plans exist to make it self-sustaining? A sustainable digitization strategy may well include the creation of digital surrogates that serve short-term needs and do not demand long-term support. The crucial thing is to anticipate what support, if any, will be needed.

Selecting materials for digitization is more complex than is selecting materials for the purchase or licensing of born-digital materials, because it involves expending resources for items that are already in the library’s collection rather than acquiring new ones. In theory, a library would choose to digitize existing collection items only if it could identify the value that is added by digitization and determine that the benefits outweigh the costs. But in practice, the research library community has, over the past decade, gone boldly forth with digitization projects not knowing how to measure their costs or benefits. Digitization technology and its costs are constantly changing; as a result, budgeting models that make comparisons between libraries can be meaningless or downright misleading. Unlike selecting officials who decide the purchase or license of electronic resources, those responsible for digital conversion do not have a set of fixed prices for services and collections on offer. The only way for many libraries to get at the issue of cost is to undertake projects for their own sake, in the expectation that documentation of expenditures will yield some meaningful data. Libraries that have been able to secure funding for projects, document their activities and expenditures, and share that information with their colleagues have emerged as the leaders of the community, if only because of their policies to share their knowledge. Their experiences are more relevant for this report than are those of others who have embarked on fewer projects or who have failed to document and share their knowledge.

The other unknown factor in this first decade has been the benefit-the potential of this technology to enhance teaching, research, lifelong learning, or any number of possible goals that digitization is intended to achieve. How could we know in advance how users other than ourselves would adapt this technology? How could we conceptualize use of digitally reborn collections except by extrapolating what we know from the analog realm? Regrettably, most academic institutions, despite their clearly stated goals of improving or at least enhancing research and teaching, have done less than they might have to gather meaningful data about the uses of digitized collections. While this report does address issues of costs and benefits, it should be remembered that as a community, we still have insufficient data on which to draw firm conclusions and base recommended practices.

This report aims to synthesize experiences in order to identify trends, accomplishments, and problems common to libraries and to many cultural institutions when they represent their collections online. A brief review of rationales for digitization is followed by a discussion of the ways in which digitization affects an institution. What consequences, intended or not, result from selection decisions? What factors, such as funding, may set constraints on decision making? One of the chief factors influencing selection decisions is copyright. This topic will not be explored here in detail because of its complexity, but considerations related to rights management are often foremost in mind when librarians assess collections for digitization. At the program planning stages, copyright is often viewed from the point of view of risk management.1 Eliminating materials that are or might be under copyright reduces the risk of infringement to zero. But how does that affect the concepts of completeness and fair use that traditionally guide library access policies?


FOOTNOTE

1 Literature on copyright abounds; among the most useful in program planning is by Melissa Levine in Sitts 2000.