The Cornell / Xerox / Commission on Preservation and Access Joint Study in Digital Preservation
A. New Preservation Method
Digital image technology provides an alternative–of comparable quality and lower cost–to photocopying for preserving deteriorating library materials. Subject to the resolution of certain problems, digital scanning technology also offers a cost effective adjunct or alternative to microfilm preservation.
Digital technology represents a new preservation technique that can be used in the place of or in combination with analog processes, such as photocopy or microfilm. This study has demonstrated that 600 dpi scanning is comparable or superior to the quality achieved in preservation photocopy, a standard preservation option in most libraries. The digitally produced paper product is also superior to the paper printout from microfilm produced on a standard reader/printer. An evaluation of quality is described below. The production of microfilm from the digital files and its comparison to the quality achieved via standard light lens film will be considered in the next phase of Cornell’s continuing investigation into the use of digital technology.
The cost study, also presented below, indicates that the economics of production, maintenance, and duplication are competitive with photocopying and microfilming, and, over the course of the next decade, will become significantly cheaper. The print duplication capability and the advantages of network access associated with the use of digital technology should enhance the national and international preservation effort.
Digital technology’s practical utility, however, is dependent on the successful resolution of a number of issues, including: the development and application of standards and protocols for creation, storage, preservation, and use; the development of standards for technology refreshing; the growth of service bureaus or regional centers that can provide preservation scanning services for libraries unable to establish in-house programs; and the recognition of digital technology as a legitimate preservation technique by federal, state, and private funding agencies.
1. Quality Evaluation
Six hundred dpi binary scanning represents a viable preservation alternative to light lens processes for creating paper replacements of deteriorating originals. In reaching this conclusion the study compared the 600 dpi paper output to photocopies and to paper printouts produced from microfilm, as well as to paper copies produced via lower resolution scanning.
There are several advantages to copying a page digitally. Because the resulting image is digitally encoded, it can be reproduced and transmitted with no loss of quality. With analog processes, there is a discernible difference between the second and subsequent generations of an image. This means that preservation in the digital world will be based on maintaining “information” in an accessible form. In the analog world, preservation is based on maintaining the physical medium (e.g., paper, film).
Digital image scanning can also lead to improved image capture. For instance, Xerox has developed a windowing application that segments a page containing both text and illustrations in a manner that enables different settings to be used and optimizes the reproduction of both.
A digital image can also be edited and density levels adjusted to remove underlining, foxing, and stains or to increase legibility, options which are especially valuable when paper containing high levels of lignin has darkened considerably. A page may be cropped so that black borders common in photocopying are eliminated. Obviously, in producing a replacement copy, decisions must be made as to how much enhancement is desirable or affordable, but the capabilities exist for producing a copy that meets or exceeds the quality of one produced with photocopy technology.
The Joint Study compared the image quality obtained using light lens processes to that of the prototype scanning system. Copies of a standard facsimile test chart were produced on a Canon 8580 photocopier, a Minolta RP605Z reader/printer, and the Xerox scanner, which was used to produce 200, 300, and 600 dpi versions. The IEEE Facsimile Test Chart provides a means for evaluating the capture of text, gray scale, line art, photography, and resolution. The results of this comparison are included in Appendix I. The sample pages located there offer one illustration of the advantage of 600 dpi scanning and printing over lower resolution image capture and also demonstrate the current system’s superiority to light lens print processes in capturing illustrated text.
2. Cost Study
Digital technology represents an affordable alternative to light lens processes for reformatting brittle books. In evaluating the various reformatting options, it is important to consider not only the costs of the initial copying, but also the costs (and value) over time of providing subsequent copies, access, storage, and maintenance.
Since each preservation process has different objectives or advantages, comparison of equivalent costs is difficult. To the extent possible, our cost comparison attempts to relate similar objectives, but this is not always feasible. Our most general finding is that when similar objectives are compared, the costs of using digital methodologies are competitive with the costs of using traditional preservation reformatting techniques. Moreover, there is a greater likelihood that costs of digital processes will decrease over time than is the case with other reformatting options. Thus, cost alone should not be the determining factor in the choice of a preservation format.
* On average a book will be refreshed twice in a decade. For instance, a book created in 1992 Will be refreshed in 1996 and 2000. ** This weighted average binding cost assumes 20% library binding, 40% In-line, and 40% unbound/stapled. *** This is highly dependent on the choice of technology.
Table A enumerates the component costs associated with the use of digital technology to create paper facsimiles, to maintain a master in the digital library, and to produce subsequent printed copies. These figures are based on a time and cost study conducted in the last three months of 1991. They are also based on a number of assumptions and projections. For example, rates of change of component costs (increases or declines) are assumed and projected into the future, as indicated in Table A. A description of the cost study is contained in Appendix II. Details of the measurements and assumptions underlying Table A are contained in Appendix III.
It must be noted that the scanner used in this study is not yet available in the market. A cost has been imputed to this scanner based on comparable costs and capabilities of other scanners. Within reasonable limits, the findings of this study are not sensitive to this figure, since scanning costs are dominated by labor rather than equipment costs.
These component costs can be combined in a number of ways to facilitate various comparisons with other methods. We illustrate, as examples, comparisons with photocopying for the production of one paper facsimile; with microfilming as a potential means of long-term storage, including the costs of “technology refreshment”; and with photocopy for the production of subsequent printed copies of the book. These are summarized in the form of “findings.”
Finding 1: Digital technology offers an economic alternative to photocopy to produce a paper replacement.
Traditionally, photocopy is chosen as a preservation technique when a paper copy of a book must be produced and returned to the shelf. Table B compares the cost of producing a paper facsimile via photocopy and digital imaging. To ensure valid comparisons, in each case it is assumed that no copy other than the paper replacement is retained.
The costs indicate that producing one copy of a book using digital technology has economic advantages, even at this early stage in the development of the technology. The investment of labor to handle the deteriorating book is the largest component of cost in each case. Labor will increase with inflation, and therefore the costs of each option will increase over ten years. With digital technology, however, costs will rise more modestly as the costs of technology decline to partially offset increases in labor and finishing. Because of its lower costs and other advantages, digital scanning and printing could replace photocopying once widespread production capabilities can be established.
|10 Year Average||1992 Cost|
|Scanning:Labor and Equipment ||$25.56||$29.17|
|Overhead – 30%||$10.14||$10.94|
|Library Binding ||$7.00||$8.80|
|Total Printed Copy from Digital Photocopy||$50.95||$56.21|
Finding 2: The production and long-term storage costs for digital technologies are competitive with those of microfilm. Subject to the resolution of certain problems, digital scanning technology will offer a cost effective adjunct or alternative to microfilm preservation.
If a paper replacement is not required, microfilm is currently the preferred format due to a high degree of permanence when properly processed and stored. The high bandwidth of film also enables the capture of the finest details of the text, although production microfilming may not always capture halftone images effectively. With today’s technology, production digital scanning, while adequate for most practical user purposes, is of lower capture resolution than microfilm.
These differences must be taken into account. However, digital technology is competitive today with microfilming for creation, storage, and maintenance of a duplication master when only costs are compared, including the costs of technology refreshing (see Section IV-I, Technology Refreshing). Table C compares the relative costs of a duplication master over ten years for both microfilm and digital technology. The duplication master in the context of digital technology is the file maintained by the creating institution. The cost for both one-up (one page per frame) and two-up (two pages per frame) microfilming are included. Although most current microfilming projects use the latter, it may well prove that one-up microfilming will provide better results if and when the microfilm is converted to digital files. The Yale study, mentioned earlier, should provide information on the best method for creating microfilm that will subsequently be scanned. In Table C, we only show the comparative costs of capture in 1992 and maintenance for ten years. However, the digital costs for capture in subsequent years will rise more modestly than the microfilm costs because of the declining costs of digital technology. Thus, although the costs are more expensive in 1992 than two-up microfilming, digital technology will have a steadily increasing cost advantage. Digital technology already shows significant cost savings over one-up microfilming.
|Scanning: Labor and Equipment ||$25.56|
|Storing: Optical Jukebox & Media [4 & 5 ]||$6.96|
|Refreshing -10 years ||$2.87|
|Overhead – 30%||$10.62|
|Creating Archival Master||$58.50||$29.25|
|Creating Print Master||$5.00||$2.50|
|Storing 2 Generations for 10 Years||$6.66||$3.33|
Finding 3: Digital technology represents an economic means for producing subsequent printed copies.
Subsequent copies of a book can be printed on demand from the stored digital files and at a fraction of the cost of the first copy, because labor, the dominant cost, is required only once in the initial capture. Table D presents the costs of printing a book in this way compared with photocopy. Photocopying of course suffers from the disadvantage that the original or, if the loss of quality can be tolerated, a photocopy, has to be recopied each time a another copy is required. Subsequent copies will be at least as expensive as the first. This Table demonstrates that the cost to create subsequent copies using digital technology is extremely competitive. Of course, the storage costs are already assumed to been combined with the capture costs (see Table C). A paper copy can also be produced directly from microfilm, however, the cost would be more and the quality would suffer. Due to the complexity of comparing user preferences between microfilm readers, desktop workstations, and paper copies, we have not compared the costs and benefits associated with these access technologies.
|1992 Cost||10 Year Average|
Summary of Findings
These findings indicate that when the need is to replace paper with paper, the use of digital technology is economically preferable. In addition, it is considerably less expensive to produce subsequent copies from a digital file. In the future, digital technology should replace photocopy as a preservation format, once production facilities are established.
These studies also indicate that the costs of digital technologies are competitive with microfilm, including the cost of technology refreshing. For digital technology to compete with microfilm technology, however, refreshing must become institutionalized. Furthermore, the resolution of the stored image using today’s technologies is not as high as that of microfilm, although it offers advantages for capturing illustrated material and may be adequate for many purposes.
Paper copies can be produced more cheaply with digital technologies, and the quality is superior to that produced by most microfilm reader-printers. Of course, the primary means of access to microfilm is the microfilm reader normally located in the library. Further work needs to be done to compare the added value of providing desktop access at the researcher’s workstation to stored digital books.
B. New Access Method
The network-connected digital library offers a new access method. In the future this technology will permit viewing books on workstations, browsing collections from several institutions at the same time, and producing print-on-demand facsimiles for use. New forms of indexing will be needed to navigate this information resource.
1. Network-Connected Digital Library
Data communication networks provide instant connection to information resources. The growth of the Internet in the past five years is an indication that an increasing number of individuals are using networks for communication and information delivery. In 1991, the United States Congress passed the High Performance Computing Act, intended to support the creation of the National Research and Education Network (NREN), a national network system that will support the bandwidth needed for the rapid transmission of digital image files.
The digital library can be viewed by researchers from home or office workstations connected to the network. Browsing virtual library “shelves” from the workstation introduces a new kind of access to library collections. Digital technology now provides the ability to generate print-on-demand facsimiles of library books. The use copy from the digital library can be printed in response to a request submitted at the workstation. The network infrastructure connects all the components of the system: the request for a printed copy travels to the library image storage system, then to the printer, producing a cost-effective paper copy for use. Network protocols regulate the transmission of requests, and the transfer of data files to satisfy the request.
2. Navigating the Digital Library
With new and more complex sources of information on-line, improved indexing is needed so that the researcher using a workstation can find the resources that are available. New indexes in no way obviate the need for traditional catalogs. In fact, the catalog needs to be updated with records to locate library material stored in digital format, and new links between the catalog and this material need to be developed. The most important reason to add information to the catalog is so that these sources become a part of the total collection of the library. The library catalog brings sources together. But we also need to change the concept of the catalog (i.e., records of items in a particular collection) as the concept of the “collection” changes.
New indexes will link the on-line catalog to the digital library, and one digital library to others. Further indexing is required to represent such detail in the digital library. Traditional bibliographic records bring the user to the whole book. In the digital world, the user will want to be able to access parts of a document, such as a chapter or index of a book.
A very preliminary experiment took place in June 1990 using the table of contents and the index for one of the l9th century volumes in this project. The pages were scanned and run through a Kurtzweil Optical Character Recognition program with an error rate of 3%. It was concluded that this error rate was too great for representing those parts of the book to the reader, particularly given the nature of the material in this project, which included a heavy preponderance of mathematics texts and volumes in non-Roman languages. A second pilot project involves the keying in of the table of contents for some of the math books using TeX software. This is a very time consuming process, and only a few titles had been completed by project’s end.
C. Applications Beyond Preservation: Electronic Publishing at Cornell
Library preservation operations have only limited resources to devote to system development. If a digital solution to the preservation crisis is to be achieved, that solution needs to have commercial application beyond preservation alone. The library can then leverage other applications that have commercial viability, and benefit from the development that they fund. In the case of the CLASS system, electronic publishing applications are emerging that will use the same technological infrastructure as CLASS, and preservation can reap the benefits of developments that were created for other purposes.
Cornell University is among a number of universities engaged in projects that will define the boundaries of electronic publishing for the future. Cornell expects to participate in a multi-institutional project with Elsevier, the largest commercial publisher of technical journals, to experiment with the electronic distribution of material science journals. At Cornell, these journals will become a part of the digital library, and the view stations that support browsing of digitally preserved books will also support the browsing of Elsevier journals.
The Synthesis Coalition (a coalition of engineering colleges from several universities, headquartered at Cornell) is developing network based tools for teaching engineering. As a part of this work, the Coalition is engaged in a joint study with John Wiley and Sons, Publishers to experiment with the electronic viewing and use of engineering textbooks. Approximately forty Wiley texts have been included in the experiment. These books will also become part of the digital library. Engineering students will use the view station software from the CLASS project to read and select from these volumes, which will be integrated with the navigator tools provided by the Synthesis Coalition.
Customized publishing that combines material from various sources to meet the needs of a particular course is already an important Cornell program. The Cornell Campus Store, under the direction of Rich McDaniel, is pioneering the use of electronic publishing to produce customized coursepacks composed of selected published material combined with faculty prepared selections, and to print them on demand. The publishing system used for this application is an extension of the CLASS system. Customized publishing depends on efficient procedures and systems to manage the clearance of copyright. In response to the new pressure implicit in the electronic arena, new copyright clearance services are being offered by such organizations as the National Association of College Stores (NACS) to meet the needs of academic organizations.
An application of electronic publishing currently being explored at Cornell involves the potential collaboration between the Cornell University Press and the Campus Store. Cornell Classics on Demand is a proposed experiment where out of print books from the Press could be scanned and offered for sale on a print-on-demand basis through the Campus Store. These books may never again be out of print. Short print runs, even one or two copies, can be done to meet customer demand. The system that could run Cornell Classics on Demand is essentially the same CLASS system used for library preservation.
These are by no means all of the projects in electronic publishing that are now being conducted at Cornell. Library preservation has some requirements that are special and will not be met by any of these applications, but it also has much in common with them. The common aspects of each project should result in significant progress that can be of benefit to preservation.