CLIR Issues Number 4
What Are Digital Libraries?
--by Donald J. Waters
Cornell Project Will Assess Risks of Migration Strategy
--by James M. Morris
Task Forces to Meet in Plenary Session
--by James M. Morris
What Are Digital Libraries?
--by Donald J. Waters
THE MEANING OF the term "digital library" is less transparent than one might expect. The words conjure up images of cutting-edge computer and information science research. They are invoked to describe what some assert to be radically new kinds of practices for the management and use of information. And they are used to replace earlier references to "electronic" and "virtual" libraries.
The partner institutions in the Digital Library Federation (DLF) realized in the course of developing their program that they needed a common understanding of what digital libraries are if they were to achieve the goal of effectively "federating" them. So they crafted the following definition, with the understanding that it might well undergo revision as they worked together:
Digital libraries are organizations that provide the resources, including the specialized staff, to select, structure, offer intellectual access to, interpret, distribute, preserve the integrity of, and ensure the persistence over time of collections of digital works so that they are readily and economically available for use by a defined community or set of communities.
This is a full definition by any measure, and a good working definition because it is broad enough to comprehend other uses of the term. Those other definitions focus on one or more of the features included in the DLF definition, while ignoring or de-emphasizing the rest. For example, the term "digital library" may refer simply to the notion of "collection," without reference to its organization, intellectual accessibility, or service attributes. This is the particular sense that seems to be in play when we hear the World Wide Web described as a digital library. But the words might refer as well to the organization underlying the collection, or, even more specifically, to the computer-based system in which the collection resides. The latter sense is most clearly in use in the National Science Foundation's Digital Library Initiative. Yet again, institutions may be characterized as digital libraries to distinguish them from digital archives when the intent is to call attention to the differences in the nature of their collections.
The DLF's definition of "digital library" does more than simply enumerate features. It serves in addition as the basis for the DLF's perspective on the scope of digital libraries and on the functional requirements for their development. Brief consideration of certain features of the definition will help to explain its significance to the DLF.
"Organizations that provide the resources...." Digital libraries are organizations that employ and display a variety of resources, especially the intellectual resources embodied in specialized staff, but they need not be organized on the model of conventional libraries (or even within the context of conventional libraries). Though the resources that digital libraries require serve functions similar to those within conventional libraries, they are, in many ways, different in kind. For example, for storage and retrieval, digital libraries are dependent almost exclusively on computer and electronic-network systems, and systems-engineering skills, rather than the skills of traditional catalogers and reference librarians, rank high among the essential staff resources.
Far from emulating the organization of conventional libraries, the organization and structure of digital libraries, and the division of labor within them, are open to considerable experimentation. For example, as publishers and professional societies disseminate works electronically, they are testing how far their investments should incorporate the full range of library functions—and when digital libraries license content from publishers and professional societies that manage their own repositories, they are, in effect, "outsourcing" the library storage function and experimenting with distributed repositories. Further, new organizations appear regularly in the form of small, entrepreneurial, cottage-like industries that scholars, laboratories, and others have developed to create, manage, and disseminate bodies of digital information critical to a discipline or set of disciplines. The physics preprint archive at the Los Alamos National Laboratory is one such development that compels reflection on how digital libraries might best be organized.
"Preserve the integrity of and ensure the persistence...." Each of the functions enumerated in the working definition of "digital library"'select, structure, offer intellectual access, interpret, distribute, preserve integrity, and ensure persistence—is subject to the special constraints and requirements of operating in a rapidly evolving electronic and network environment. The continual change in the environment means that the latter two functions—preserve integrity and ensure persistence—are especially difficult to achieve. But the DLF regards these functions as central to the concept of "digital library" and follows the Task Force on Archiving of Digital Information in identifying them as linked but distinct. The Task Force argued that the integrity of digital objects is measured in terms of content, fixity, reference, provenance, and context. But it argued as well that the preservation of object integrity, though necessary, is not a sufficient condition of persistence. Persistence depends on other factors as well: organizational will, financial means, and the negotiation of legal rights.
"Collections of digital works...." Distinctions among libraries commonly focus on the subject matter that defines the collections (e.g., medical, art, science, music, and such), or on the communities interested in the collected materials (e.g., research, college, public). The DLF is convinced that, as digital libraries mature, the principle defining their collection policies will not be the "digital-ness" of the material. Rather, the defining principles will be, as in other libraries, the subject matter of the materials and the patron community interested in them. The key strategic question for digital libraries anticipating such a development will be how to integrate collections of materials in digital form with materials in other forms. Much of the DLF program seeks to address this critical question.
"Readily and economically available...." Like other organizations, digital libraries need to develop criteria for measuring their performance in an evolving and highly competitive environment. At a minimum, they must reflect the functional attributes of a digital library as described above. One essential measure of the quality of service evaluates performance in terms of cost. Although the costs of digital library service are not yet well understood, the DLF appreciates that successful digital libraries have a sure grasp of critical cost factors and work quickly to economize the influence of those factors. A second essential measure of service quality takes account of how willingly and how responsively a digital library makes information available to its patron communities.
"Use by a defined community or set of communities..." Libraries in general, and digital libraries in particular, are service organizations. The needs and interests of the communities they serve will ultimately determine the trajectory of development for digital libraries, including the investment they make in content and technology. Most of the libraries in the DLF are dedicated to supporting higher education and research, and they justify their investment in digital developments (and in the collaborative work of the Digital Library Federation) as a powerful means of realizing the larger institutional goals of the academic communities they serve.
New DLF Participants
|The Digital Library Federation is pleased to welcome the following libraries as new partners:
California Digital Library
Cornell Project Will Assess Risks of Migration Strategy
--by James M. Morris
CLIR IS SUPPORTING a project in the Cornell University Library that will assess the risks associated with pursuing a digital migration preservation strategy for a number of digital object types. Under the direction of Gregory W. Lawrence, the Government Information Librarian at Cornell, the project will survey the Library's extensive digital holdings, identify the risks that attend migration of the most common file formats, develop a risk assessment tool, and conduct a pilot test of the tool on a major file format. The risk assessment tool is expected to be of use to other libraries as well in the management of their digital collections.
The steady growth of digital information as a component of major research collections has significant implications for college and research libraries. Many university libraries have been creating or collecting digital information in a range of standard and proprietary formats. Each of these formats continues to evolve, becoming more complex as revised software versions add new features or functionality. It is not uncommon for software enhancements to "orphan," or leave unreadable, files generated by earlier versions. This threat to digital information of a certain age has surpassed the immediate danger of unstable media or obsolete hardware. The most pressing problem confronting managers of digital collections is now software and data-format obsolescence.
The tacit assumption is that digital libraries will preserve the electronic information they create or that is entrusted to their care. To preserve this information requires the management of collections in a consistent and decisive manner. But it is difficult to decide what should be preserved, and in what order of priority, and with what techniques. There is little guidance available on these matters. Major organizations (such as the National Archives and Records Administration) have yet to develop standards for document formats other than ASCII, and specialized reports, prepared by national committees, have focused either on broad recommendations or on organizational and legal issues. Based on its internal experience managing electronic collections, the Cornell Library believes that some form of "risk management" must replace "heroic rescue" as a means of preserving digital information.
Currently, there are two radically different solutions for preserving digital information: migration and emulation. Migration strategies periodically transfer digital materials from one hardware/software configuration to another, or from one generation of computer technology to a subsequent generation. Emulation strategies store—along with the digital files themselves—copies of the initial software and descriptions of how to "emulate" the initial hardware to run the software.
Neither solution is without some risk. Migration may not work for specialized, proprietary formats. It may save the content of a file but lose or diminish the internal relationships or contexts of the information. And the migration of common formats may lose, accidentally or intentionally, certain fundamental features of the data, such as embedded macro commands. It has been suggested that the software industry will adopt common formats for digital publishing, thereby making migration more attractive and easier to regulate. But the Information Industry of America, a business organization with 550 corporate members, argued in testimony before Congress against such a standard for Federal information, citing the need for a range of publishing options to meet the many information needs of Federal agencies. So the policy and procedural framework for migration appears to be far from complete.
The second strategy, emulation, assumes future access to multiple data objects: the data file to be preserved and reused, the application software that generated the data file, the operating system in which the application functioned, and the hardware environment emulated in software using detailed information about the attributes of that hardware. If one or more of the components were missing, this complex environment would most likely fail. Unlike migration, which can be broadly tested, emulation remains something of an abstract proposal. Cited examples, such as emulators of Atari video games, are virtually unknown among mainstream data users; their stability and reliability are untested, and it is not certain how broadly they may be applied. (A report on emulation, commissioned by CLIR from Jeff Rothenberg of the RAND Corporation, is due shortly.)
It is difficult to determine the circumstances under which one of these two quite different strategies, migration and emulation, would prove superior to the other. The Cornell project, which should be completed by September 1999, will take steps toward resolving the uncertainty.
The project has an entirely appropriate home at Cornell, where the Library has created or acquired nearly a terabyte of data and other digital resources representing some 3,000,000 image files, more than 300 statistical data series, and numerous bibliographic databases. Cornell is committed to the long-term maintenance of this critical mass of organized digital information that was produced in a variety of formats, but the Library will determine the appropriate preservation options only after weighing the degree of risk to each of the different formats.
Task Forces to Meet in Plenary Session
--by James M. Morris
EACH OF THE five task forces convened by CLIR and the American Council of Learned Societies (ACLS) to discuss the changes technology will bring to research and scholarship met once this past winter. After the last session, a set of listservs was established to allow the conversations to continue by means of the technology. The task forces, which are organized around types of materials (area studies materials, audio materials, manuscripts, monographs and journals, and visual materials) have as their charge to consider the following overarching questions: What changes in the process of scholarship and instruction will result from the use of digital technology, and how can we assure that libraries and archives continue to serve the research needs of scholars and students in the face of technological transformation? This summer, the task forces will meet once again, not individually but in plenary session because during the first sessions members of every group expressed regret that their colleagues with other areas of expertise were not present. Convening the full complement of participants will allow the discussion to extend easily across formats and to be properly inclusive.
The final report on the work of the task forces should be shaped overall to consider how technology may affect the use, growth, and disposition of collections of various kinds of materials that scholars need for their research and teaching. To that end, the agenda of the plenary session includes for discussion four principal items that have emerged from the deliberations thus far.
The single need most often cited by participants in each of the task forces was for finding aids. These were of various sorts, from inventories of special collections to full bibliographic descriptions. The approach to building them might vary from medium to medium, but the impetus behind the effort remains the same: if scholars are to use materials, they must first know of their existence and their location. A good number of task force members thought the greatest contribution the technology might make would be to help alert scholars—and others—to what there is to be explored. They did not insist that the technology transform the materials and that everything be digitized. They asked only that it perform the less grand but essential function of directing them to materials. If this is indeed a fundamental need, what should be the approach to constructing the kinds of resources being called for? Should there be national databases? And who should manage the construction and maintenance of these databases?
A second matter of importance might best be characterized as the comprehensive management of collections—a topic that addresses how they are built, preserved, kept (on-site and off- site), and made accessible. In what formats are collections to be developed in the future? The area studies group, for one, was worried that libraries would not continue to acquire materials useful to them, in favor of digital materials. The issues are also financial and educational. What will institutions be able to afford, alone or in collaboration, and how can scholars be persuaded of the practicality, and perhaps the inevitability, of what libraries must do to manage collections responsibly? Do faculty members understand sufficiently what is going on in their libraries, and why? Should CLIR issue guidelines for the management and preservation of collections?
The third matter that seemed fundamental is the need for a proper infrastructure if the technology is to be useful. This infrastructure will be technical in large part. But there needs to be an educational component to the infrastructure as well, to effect a sympathetic transition from the old environment to the new. The task forces have considered the work of humanists primarily, and the plain fact is that many scholars shy away from the technology because they have never been instructed in how to use it—instructed, that is, in understanding not the inner workings of the machinery but the dimensions of what the machine makes possible and the sequence in which the right buttons on a keyboard are to be hit to achieve a desired result.
One of the task force participants, a senior professor at his institution, made a point that is worth some attention: —Without excuses I must say that mastering the computer, just for my own needs, is increasingly daunting. There are many reasons—my intelligence, my life, my age, and the increasing complexity and length and numbers of programs. But there is a recommendation that occurs to me. Someone should design a master course for academics in the humanities—what you need to know —and it should be updated every six months or year. It should include Windows 95 and NT, MS Word, MS Access, and probably some Internet application. It might include something on image-capture, etc.? This straightforward plea is evidence that grander dreams are being dreamed than may reasonably emanate from the minds of the very individuals on whose behalf they are being dreamed.
The final topic woven in and out of all the task force discussions was copyright and the congeries of issues around intellectual property rights'of materials both digital and not. There was a general unfamiliarity with copyright restrictions, even of a traditional cast, on the scholars' part. What, then, will they make of the complex policies that may evolve to govern the use of digital materials' CLIR cannot lobby in conjunction with the copyright legislation the Congress may pass, but what educational initiatives might CLIR undertake for scholars?
The final report we publish later this year will draw upon recommendations made during the plenary meeting and describe the projects and courses of action that flow from its deliberations.
Three New Members Join CLIR Board
AT ITS MAY meeting, the Board of CLIR elected three new members: Virginia Betancourt Valverde, Robert D. Bovenschulte, and Charles E. Phelps.
Virginia Betancourt is the National Librarian of Venezuela, a position she assumed in 1977. Born in Costa Rica, Ms. Betancourt was founder and president of Banco del Libro (1960), Fundalecture (1985), and UNUMA (1980)—all in Caracas. In 1989, she became Executive Secretary of ABINIA, the Association of Iberoamerican National Libraries. She has helped to make the National Paper Conservation Center, located in the National Library of Venezuela, the focal point of the IFLA/PAC Program for Latin America and the Caribbean and, through ABINIA, an important contributor to the well-being of other Ibero-American national libraries. Ms. Betancourt, who holds a master's degree in sociology from the University of Chicago, has written on the significance of ABINIA for libraries in Latin America, on libraries as agents of change, and on the role of the national library in the development of a modern public library system in Venezuela, and she has been honored by UNESCO with its International Book Award.
Robert Bovenschulte's career spans scholarly, professional, trade, college, and school segments of the publishing industry. He is currently Director of the Publication Division of the American Chemical Society (ACS), which publishes journals, magazines, books, and electronic products. Before he joined the ACS in 1997, he was Vice President for Publishing at the Massachusetts Medical Society, which owns The New England Journal of Medicine. He is chairman of the Board of Directors for the Copyright Clearance Center and serves on the Executive Council of the Association of American Publishers' Professional and Scholarly Publishing Division and the Executive Board of the International Association of Scientific, Technical, and Medical Publishers.
Charles Phelps is Provost of the University of Rochester. He holds a doctorate in business economics—the economics of health care, in particular—and an MBA in hospital administration, both from the University of Chicago. He worked at the RAND Corporation from 1971 to 1984, as Staff Economist, Senior Staff Economist, and, for five years, Director of RAND's Program on Regulatory Policies and Institutions. While at RAND, he studied issues involving health policy, natural resources and environmental policy, and energy policy. In 1984 he became director of the Public Policy Analysis Program at the University of Rochester, leaving that position in 1989 to become Chair of the Department of Community and Preventive Medicine in the School of Medicine and Dentistry. He was named Provost of the University in 1994. Mr. Phelps has been associate editor of three journals (Journal of Health Economics, Journal of Policy Analysis and Management, and Journal of Risk and Uncertainty), and was elected to the Institute of Medicine of the National Academy of Sciences in 1991.
CLIR Names 1998 A.R. Zipf Fellow
THE 1998 A.R. Zipf Fellowship in Information Management has been awarded to Maureen L. Mackenzie of Long Island University. Ms. Mackenzie is the second recipient of the Zipf Fellowship, which was established to recognize a graduate student who shows exceptional promise for leadership and technical achievement in information management.
A.R. Zipf, who resides in California, was a pioneer in information management systems and a guiding force in many of the dramatic technological changes that occurred in the banking industry over the course of a forty-year career with the Bank of America. The fellowship reflects Mr. Zipf's longstanding interest in assisting students and young professionals in pursuit of advanced degrees in library science and other fields that involve the management of information.
Ms. Mackenzie is a Ph.D. candidate in the Palmer School of Library and Information Science at the C. W. Post Campus of Long Island University. After working in the insurance industry for more than a decade, she left her position as Regional Marketing Manager at Allstate Insurance Company last year to devote more time to her studies for the Ph.D. Her particular research interests include the information-seeking behavior of middle- and top-level managers and the effects of information on the conduct of business.
Dr. Martin M. Cummings, former director of the National Library of Medicine, is chairman of the Zipf Selection Committee. He spoke for the committee in saying that Ms. Mackenzie was selected from a strong field of applicants and that her personal and professional qualifications mark her as an outstanding candidate and an eminently deserving winner.
For further consideration of the issues raised
Ann Peterson Bishop and Susan Leigh Star, "Social Informatics and Digital Library Use and Infrastructure," in Martha Williams, ed., Annual Review of Information Science and Technology, vol. 31. Medford, N.J.: American Society for Information Science, 1996.
C. L. Borgman, M. L. Bates, et al. Social Aspects of Digital Libraries. Final Report to the National Science Foundation, Computer, Information Science, and Engineering Directorate. Available at http://www.gslis.ucla.edu/DL.
Clifford Lynch and Hector Garcia-Molina, Interoperability, Scaling, and the Digital Libraries Research Agenda: A Report on the May 18-19, 1995 Information Infrastructure Technology and Applications Digital Libraries Workshop. National Science and Technology Council. Available at http://walrus.stanford.edu/diglib/pub/reports/iita-dlw/main.html.
Preserving Digital Information. Report of the Task Force on Archiving of Digital Information. Washington, D.C.: Commission on Preservation and Access and Mountain View, CA: The Research Libraries Group, 1996. Available at http://www.rlg.org/ArchTF.
Donald Waters, The Digital Library Federation: Program Agenda. Washington, D.C.: Council on Library and Information Resources, 1998.
|Council on Library and Information Resources|
1755 Massachusetts Avenue, N.W. Suite 500
Washington, DC 20036
Fax: (202)-939-4765 · E-mail: firstname.lastname@example.org
The four current programs of CLIR are the Commission on Preservation and Access, Digital Libraries, the Economics of Information, and Leadership.