Council on Library and Information Resources

Username (email)

Password

Contents

Risk Management of Digital Information:

A File Format Investigation

 

report cover

 

by
Gregory W. Lawrence
William R. Kehoe
Oya Y. Rieger
William H. Walters
Anne R. Kenney

June 2000

 

 

Copyright 2000 by the Council on Library and Information Resources. No part of this publication may be reproduced or transcribed in any form without permission of the publisher. Requests for reproduction should be submitted to the Director of Communications at the Council on Library and Information Resources.

 

About the Authors

Preface

Risk Management of Digital Information

Introduction

Literature Search

Digital Preservation and Migration
Risk Assessment
File Format

Risk Assessment as a Migration Analysis Method

Risk Assessment of General Collections

Risk Assessment of File Formats

Assessing Risk in Conversion Software
Assessing Recurring Risk Inherent in a Large Heterogeneous File Collection
Assessing Risk Associated with the File Conversion Process
Identification of Metadata-Related Risk

Case Studies

Findings and Recommendations

Migration Risk Can be Quantified
Conversion Software
Access to Format Data
Public Access Archives of Format Information

References

Appendix A: Risk-Assessment Workbook

Appendix B: Documentation for Format Migration Test File, Lotus 1-2-3, Release 2.2

Appendix C: Documentation: Examiner and RiskEditor

Appendix D: Case Study for Image File Format

Appendix E: Case Study for Lotus 1-2-3 .wk1 Format

Supplemental Documentation

Appendixes F and G did not appear in the print version of Risk Management of Digital Information.

Appendix F: Migration Software Analysis, Software Assessment Sheet

Appendix G: Specifications for the Cornell Digital Library Format


 

About the Authors

Gregory W. Lawrence is government information librarian at Cornell University's Albert R. Mann Library. He has participated in numerous research and development projects concerning the implementation of the electronic library, and he speaks and writes on these topics. He has received the American Library Association (ALA) Best of L.R.T.S. Award, the ALA Blackwell North American Scholarship Award, and the United States Department of Agriculture (USDA) Secretary's Honor Award. Mr. Lawrence is a past chair of the Preservation Committee, Depository Library Council.

William R. Kehoe is programmer/analyst specialist in the Information Technology Services Department at Cornell University's Albert R. Mann Library. He wrote his first assembly language "Hello, World" program in 1978, and has used C, Perl, VisualBasic, and Java to develop business applications and digital library delivery systems. He is currently the system architect for a service that will deliver interactive maps over the Web from numeric data supplied by the National Agricultural Statistics Service. For his participation in the USDA Economics and Statistics System, he received the USDA Secretary's Honor Award.

Oya Y. Rieger has been a librarian at Cornell University for eight years, where she has held positions as numeric files librarian, USDA Economics and Statistics System Project coordinator, and gateway manager for Cornell University Library's Web-based information system. She has participated in several development projects related to electronic libraries and user support services and has written and spoken frequently on these topics. In her current position as coordinator of the Digital Imaging and Preservation Research Unit, she manages a range of digital imaging and preservation research, demonstration, and training projects. She is the coeditor of RLG DigiNews. Ms. Rieger and Anne Kenney have written a new monograph, Moving Theory into Practice: Digital Imaging for Libraries and Archives, which was recently published by the Research Libraries Group (RLG).

William H. Walters was the social science bibliographer in the Albert R. Mann Library at Cornell University. Soon after the completion of this report, he accepted the position of collection development librarian at St. Lawrence University. A Ph.D. candidate in sociology (demography) at Brown University, Mr. Walters has conducted research in librarianship, demography, cartography, and economic sociology.

Anne R. Kenney is the associate director of the department of preservation and conservation and codirector of the Cornell Institute for Digital Collections. Since 1989, she has been involved in a continuing series of research and production projects centering on the use of digital imaging for preservation reformatting and enhanced access. She has written and spoken widely on the topic of digital imaging and been involved in several intensive digital training programs, both at Cornell University and on behalf of RLG. She is the coauthor of the award-winning publication Digital Imaging for Libraries and Archives (1996) and is coeditor with Oya Y. Rieger of RLG DigiNews. She and Ms. Rieger have written a new monograph, Moving Theory into Practice: Digital Imaging for Libraries and Archives, which was recently published by RLG. Ms. Kenney is a fellow and past president of the Society of American Archivists.

Preface

Given the right hardware and software, digital information is easy to create, copy, and disseminate; however, it is very hard to preserve. At present, it is impossible to guarantee the longevity and legibility of digital information for even one human generation.

The Council on Library and Information Resources (CLIR) has sponsored work on possible solutions to this problem. One such solution, the development of emulators, would enable access to information created with software and hardware that has become obsolete. The merits of emulation are widely debated, and the approach has yet to be developed for broad, practical use. A more viable strategy, many argue, is migration, which the CPA/RLG Task Force on Archiving of Digital Information defines as "the periodic transfer of digital materials from one hardware/software configuration to another, or from one generation of computer technology to a subsequent generation."

This report does not argue the merits of emulation or migration for longevity; rather, it addresses the practical aspects of migration in an operating library. Migration is, in essence, a translation. With migration, as with all translations, some information is lost, no matter how skilled the interpreter. In migration, it is usually the context, rather than the data, that drops out or is improperly reconstructed in the new code. This can be crippling in dynamic formats, in relational databases, and even in simple spreadsheets. Nonetheless, given how much information already exists in digital form and the brevity of its projected life span, institutions must act now to move information forward. They cannot afford to wait for the optimal solution.

In 1998, CLIR asked the Cornell University Library to undertake a risk assessment of migrating a handful of common file formats. This report is the fruit of their investigation. It is intended to be a practical guide to assessing the risks associated with the migration of various formats and to making sound preservation decisions on the basis of that assessment. The authors start from the premise that migration is prone to generating errors, and they provide practical tools to quantify the risks. They organize migration into a sequence of discrete steps and offer assessment tools to manage each of those steps. The process is presented in a workbook that can guide digital preservation specialists in their day-to-day operations. The authors also present two case studies—one for image files and another for numeric files—that demonstrate their approach.

The goal of any risk assessment is to identify, as unambiguously as possible, the risk of loss over time and the measures that can be taken to mitigate that loss. This is what the tools are designed to do. The difficulty, of course, is determining when risk is acceptable and when it is not. The authors underscore the importance of experience and judgment in practicing the art of preservation.

Abby Smith
Director of Programs


first section in this report >>

pub 93 abstract >>