The Cornell / Xerox / Commission on Preservation and Access Joint Study in Digital Preservation
Appendix II–Cost Study Description
From October-December 1991, Cornell conducted a time and cost study to determine the costs associated with using digital technology to reformat brittle books. In addition to tracking technician time, the study calculated the costs of equipment (amortized over four years), of storing and refreshing the digital files (every four years), the cost of printing and binding the paper facsimile, and a 30%) additional amount for overhead.
The Joint Study focused on developing, testing, and evaluating a prototype scanning system for preservation. Because much time was devoted to product development, a stable production environment was not achieved for most of the study. Nonetheless an average production rate of 5,000 images/week was sustained over the course of the last year. This figure represents the total number of images scanned in a week by two technicians who also performed all of the non-scanning activities (collation, disbinding, inspection, etc.) normally associated with a preservation reformatting project. They also reflect staff time out for sick leave, vacation, training, and trips to Rochester for Project Development Team meetings, and time spent in demonstrations that resulted from the high visibility of this study.[**]
Due to the difficulties of obtaining reliable measurements in a production environment, staff recorded actual production figures in the last three months of 1991. Scanning technicians logged on a worksheet for each volume the time they spent on set up, production scanning, and rescanning. They did not record time spent in other tasks associated with selecting, preparing, and inspecting material. These tasks are common to other reformatting methods such as microfilm and photocopy and were considered the same in this project. The worksheets were then used to calculate average times for each task and the number of pages per hour scanned.
Although the size of the books varied from 100 pages to well over 700, for comparison purposes, the time spent in the various functions was adjusted to represent a 300 page book. The total time spent averaged 1.72 hours/book (103 minutes), which represented a scanning rate of 175 pages per hour. Actual scanning rates varied considerably from this figure, with a low of 92 pages/hour to a high of 264 pages/hour, depending on the size of the book, the frequency and type of illustrations, the consistency of the printing, and a number of other factors. Worksheets for volumes that involved major system difficulties (e.g. system crashes, network rollovers) were excluded, although sheets for volumes scanned immediately after software upgrades were included.
A similar time and cost study for a preservation photocopy project of Cornell’s Entomology Library materials was conducted during this same period. The average time spent in photocopying a 300 page book was 2.25 hours (135 minutes), an increase of 31% in time over the scanning process.[***]
The time spent in scanning was divided among the following tasks:
- SET UP. Average Time: .4 hours/book (24 minutes)
- The average time for set up varied from one technician to another. The comparable statistic for preservation photocopy is the time it takes to set up the template, which averaged 19 minutes in the Cornell Entomology project. Considerable time in set up is required to establish page size, determine front to back registration, and to scan the production note. It is estimated that set up time would be halved if these functions did not have to be performed manually. Xerox has a “wish list” of technical improvements for the system that will decrease the time necessary for set up.
- PRODUCTION SCANNING. Average Time: 1.13 hours/book (67.8 minutes)
- The speed of straight production scanning averaged 270-300 pages/hour, although this varied widely with the text density, the size of the book, the quality of the printed material, and the number of illustrations. Scanning time for pages that were densely packed with text or which contained illustrations increased as the file size increased. The length of a book also affected scanning time: for a 700 page book the time to scan one leaf (front and back) increased from 20.23 seconds at page 200 to 23.91 at page 400, to 26.95 at page 700. The delay was caused principally by an increase in the time it took to save a leaf and build the document structure for the book. The occasional need for quality control scanning of faint text or illustrations slowed production scanning down significantly. Finally, fatigue from scanning more than two hours at one setting led to a noticeable reduction in production. Technicians were encouraged to alternate scanning with other non-scanning functions, such as quality control or the preparation of material.
- RESCANS. Average Time: 0.1 hours/book (6 minutes)
- The final step in scanning involved rescanning of images that were either missing or found to be of unacceptable quality during the inspection of the paper copy. Fortunately, the number of rescans was low, averaging less than 1% of all pages scanned. The rate of rescanning also dropped off as technicians became familiar with the system’s image capture capabilities.