Authenticity in a Digital Environment
Copyright 2000 by the Council on Library and Information Resources. No part of this publication may be reproduced or transcribed in any form without permission of the publisher. Requests for reproduction should be submitted to the Director of Communications at the Council on Library and Information Resources.
Authentication of Digital Objects: Lessons from a Historian's Research, by Charles T. Cullen
Archival Authenticity in a Digital Age, by Peter B. Hirtle
Preserving Authentic Digital Information, by Jeff Rothenberg
Authenticity in Perspective, by Abby Smith
Charles T. Cullen is president and librarian of the Newberry Library in Chicago. A legal historian, Mr. Cullen earned his Ph.D. degree from the University of Virginia. He has taught at Princeton University, the College of William and Mary, and Averett College. While at the College of William and Mary, he worked on the Papers of John Marshall, and became editor of that project in 1976. At Princeton, he was editor of the Papers of Thomas Jefferson.
A charter member of the Association for Documentary Editing, Mr. Cullen served as its president in 1982-1983. He currently represents that organization on the National Historical Publications and Records Commission. He has served on the New Jersey Historical Commission, and was vice-chairman of the board of the Founding Fathers Papers, Inc., from 1986 until 1992. He is currently vice-president of the board of the Modern Poetry Association, and chairs the board of the Heartland Literary Society.
Mr. Cullen has published or contributed to 30 books and articles. He has lectured widely on subjects relating to the Age of Jefferson, the scholarly use of computers, and the role of humanities research libraries. His scholarly computing expertise was recognized by the Association for Documentary Editing, which in 1987 awarded him its Distinguished Service Award "for outstanding contributions to the field of documentary editing through the use of computers."
Peter B. Hirtle is co-director of the Cornell Institute for Digital Collections (CIDC). CIDC is responsible for developing digital resources, supporting their use campus-wide, and conducting applied research that advances the production and utility of such resources. Mr. Hirtle is associate editor of D-Lib Magazine, a monthly magazine about innovation and research in digital libraries.
Prior to his arrival at Cornell, Mr. Hirtle worked at the National Archives and Records Administration (NARA), first for the technology research staff (where he helped complete its most recent digital imaging report), and then as coordinator of electronic public access for the agency. He has also served as curator of modern manuscripts at the National Library of Medicine. Hirtle has a master's degree in History and an M.L.S. with a concentration in archival science. He has served on several of the units sponsored by the Society of American Archivists, most recently as a member of the executive committee of its governing council, and on the Commission on Preservation and Access/Research Library Group's Task Force on Digital Archiving. He currently serves on the Research Libraries Group/Digital Library Federation Task Force on Long-Term Retention, and is a member of the National Initiative for a Networked Cultural Heritage's Working Group on Best Practices in Networking Cultural Heritage.
David Levy is an independent consultant in the areas of documents, digital libraries, and publishing. Between 1984 and 1999, he was employed as a researcher by the Xerox Palo Alto Research Center. The focus of his work over the past decade has been the nature of documents and the tools and practices through which they are created and used. His research interests include digital libraries; the reuse of documents; document design, structure, and standards; work practice studies; the combined use of paper and digital media; and the relation between technology and the character of modern life.
Mr. Levy holds a Ph.D. degree in computer science from Stanford University and a Diploma in calligraphy and bookbinding from the Roehampton Institute, London. He is currently a member of a National Research Council commission charged with advising the Library of Congress on its technology strategy. He is completing a book, Scrolling Forward: Making Sense of Documents in a Digital Age, which will be published by Arcade.
Clifford A. Lynch has been executive director of the Coalition for Networked Information (CNI) since 1997. CNI, jointly sponsored by the Association of Research Libraries and EDUCAUSE, includes about 200 member organizations concerned with the use of information technology and networked information to enhance scholarship and intellectual productivity. Before joining CNI, Lynch spent 18 years at the University of California Office of the President. For the last 10 of those years, he was director of library automation. In that post, he managed the MELVYL information system and the intercampus internet for the University. Mr. Lynch, who holds a Ph.D. degree in computer science from the University of California, Berkeley, is an adjunct professor at Berkeley's School of Information Management and Systems.
He is a past president of the American Society for Information Science and a fellow of the American Association for the Advancement of Science. He currently serves on the Internet 2 Applications Council and the National Research Council Committee on Intellectual Property in the Emerging Information Infrastructure.
Jeff Rothenberg is a senior computer scientist at The RAND Corporation in Santa Monica, California. He has a background in artificial intelligence and modeling theory. His research has included developing new modeling methodologies, studying the effects of information technology on humanities research, and investigating information-technology policy issues. He has been researching the problem of digital longevity since 1992, when he coauthored a prize-winning paper in The American Archivist. He has since explored the dimensions of the problem with archivists, librarians, and others in the United States and Europe. He published a widely cited article on the subject in Scientific American in 1995. He also appeared in the documentary film "Into the Future," which was produced by the Council on Library and Information Resources in 1998. Mr. Rothenberg recently completed a project for the Dutch National Archives and Ministry of the Interior that recommended a strategy for the long-term preservation of digital records in The Netherlands. He is currently working with the Dutch Royal Library on related issues.
What is an authentic digital object? On January 24, 2000, the Council on Library and Information Resources (CLIR) convened a group of experts from different domains of the information resources community to address this question. To prepare for a fruitful discussion, we asked five individuals to write position papers that identify the attributes that define authentic digital data over time. These papers, together with a brief reflection on the major outcomes of the workshop, are presented here.
Our goal for this project was modest: to begin a discussion among different communities that have a stake in the authenticity of digital information. Less modestly, we also hoped to create a common understanding of key concepts surrounding authenticity and of the terms various communities use to articulate them.
"Authenticity" in recorded information connotes precise, yet disparate, things in different contexts and communities. It can mean being original but also being faithful to an original; it can mean uncorrupted but also of clear and known provenance, "corrupt" or not. The word has specific meaning to an archivist and equally specific but different meaning to a rare book librarian, just as there are different criteria for assessing authenticity for published and unpublished materials. In each context, however, the concept of authenticity has profound implications for the task of cataloging and describing an item. It has equally profound ramifications for preservation by setting the parameters of what is preserved and, consequently, by what technique or series of techniques.
Behind any definition of authenticity lie assumptions about the meaning and significance of content, fixity, consistency of reference, provenance, and context. The complexities of these concepts and their consequences for digital objects were explored in Preserving Digital Information: Report of the Task Force on Archiving of Digital Information, published by the Commission on Preservation and Access in 1996. There is no universally agreed-upon mandate about what must be preserved and for what purpose. For example, an archivist will emphasize the specifications of a record that bears evidence; a librarian will focus on the content, knowing that it could serve multiple purposes over time. That being the case, there may be many ways to describe an item being preserved and what aspects of that item must be documented to ensure its authenticity and its ability to serve its intended use over time. For certain purposes, some argue, migration may suit the preservation needs of a digital object. For those objects most valued as executable programs, others argue, emulation is preferable. Beyond the technical options undergirding metadata and preservation decisions, numerous nontechnical questions beg to be asked. The issue of authenticity must be resolved before humanists and scientists can feel confident in creating and relying upon digital information.
Creating a common understanding about the multiple meanings and significance of authenticity is critical in the digital environment, in which information resources exist in many formats yet are interactive. From peer-reviewed journal articles to unpublished e-mail correspondence, these resources are integrated; they can interact and be modified in a networked environment. We wanted to know whether the distinctions that have proved to be helpful heuristic devices in the analog world, such as edition or version, document or record, could help us define a discrete piece of digital information. Can we define the distinct attributes of an information resource that would set the parameters for preservation and mandate specific metadata elements, among other important criteria?
We charged the five writers—an archivist, a digital library expert, a documentary editor and special collections librarian, an expert on document theory, and a computer scientist—to address one essential question: What is an authentic digital object and what are the core attributes that, if missing, would render the object something other than what it purports to be? We asked each to address this question from the perspective he found most congenial. We emphasized our interest in the essential elements that define a digital object and guarantee its integrity, but left the writers free to grapple with that question as they saw fit.
In considering this central issue, we asked that they think about the following:
- If all information—textual, numeric, audio, and visual—exists as a bit stream, what does that imply for the concept of format and its role as an attribute essential to the object?
- Does the concept of an original have meaning in the digital environment?
- What role does provenance play in establishing the authenticity of a digital object?
- What implications for authenticity, if any, are there in the fact that digital objects are contingent on software, hardware, network, and other dependencies?
These are some of the issues that we anticipated would arise in the course of the workshop.
In thinking of which communities to include in the workshop discussion, CLIR sought expertise from the major stakeholders in these issues: librarians, archivists, publishers, document historians, technologists, humanists, and social scientists. Because so many concepts of authenticity derive directly from experience with analog information, we called upon experts in the traditional technologies, such as printing and film, to elucidate key concepts and techniques for defining and securing authenticity of information bound to a physical medium.
The authors were given time to revise their papers in light of the discussion and any comments they received from the participants. Some chose to revise their papers, and others did not. The task of writing a position paper on this complex subject (a paper that we limited in size but not scope) was quite difficult. Each writer took a different approach to the subject, and the papers differ greatly one from another. This seeming disparity proved a boon to the discussions. During that time, each writer had a chance to "unpack" the various nuances of thought that the papers held in short form only, and participants were confronted with the diverse ways that such common words as copy, original, reliable, or object are used. Much of the substance of the discussion is included in the concluding essay.
As one participant remarked, authenticity is a subject we have avoided talking about, primarily because the issues it raises appear so intractable. We are deeply grateful to Messrs. Cullen, Hirtle, Levy, Lynch, and Rothenberg for agreeing to form the advance party as we ventured into terra incognita. They were willing not only to think deeply about a vexing issue but also to commit their thoughts to writing and to careful scrutiny by others. Their papers, together with the oral summaries they delivered at the meeting, marked out several different trails to follow, each of which opened onto ever-larger vistas—some breathtaking, some daunting. We are also grateful to the participants, many of whom came from very distant places. Their thoughtful preparation and frank discussion confirmed our sense that authenticity is important for many communities, and that they are ready to engage the issue.
Director of Programs