The Commission on Preservation and Access
Inside This Newsletter
“Update on Digital Techniques” a report from the collaborative project of Cornell University and Xerox Corporation to test a prototype system for recording deteriorating books as digital images and producing high quality, archivally sound paper facsimiles.
Also, a Special Report, “Some Thoughts on Paper as an Information Storage Medium,” prepared by Peter G. Sparks at the Commission’s request as part of its scientific research initiative.
Archival Task Forces to Explore Appraisal & Documentation Strategies
To move forward with the development of a collaborative strategy for preservation of and access to archival manuscript and photographic collections, the Commission has formed two task forces–one to examine documentation strategy and one to examine appraisal theory and practice.
At the present time, it is difficult to do national planning or even to evaluate the relative merit of individual projects to preserve archival materials because there are no objective measures by which to judge the strength of a collection. It is hoped that the task forces can help develop such measures.
The groups are charged to examine existing guidelines, theory and practice in order to determine their applicability to the selection of important collections for preservation. One group will consider how best to modify and/or expand the application of documentation strategy so that it will assist archivists in making rational preservation decisions. The other group will consider how to include issues related to current or eventual preservation needs in the appraisal of new acquisitions, and how to address the reappraisal of existing holdings to determine priorities.
Each task force is being asked to produce a formal written report of its findings by May 30, 1992. The reports will then be used as the basis for discussion at a combined meeting of the two groups some time during the summer.
Timothy L. Ericson, archivist of the University of Wisconsin at Milwaukee, who recently served as interim executive director of the Society of American Archivists, is chair of the documentation strategy group. Robert E. Sink, director of archives and records management of the New York Public Library, is chairing the appraisal group. The two task forces are staffed by Margaret Child, consultant.
This past decade has been witness to a stunning proliferation of new information technologies and the widespread use of computers in all sectors of society. For that reason, archives must quickly develop the capacity to preserve the record in an increasing variety of formats–paper, audio-visual, computer tapes, and disks…Don W. Wilson, Archivist of the United States, in For the Record, The Newsletter of the New York State Archives and Records Administration and the New York State Historical Records Advisory Board, Vol. 9, No. 1, 1991.
Annual Report Highlights Users of Preserved Knowledge
This year the focus of the Commission’s Annual Report is on the present and future users of the knowledge that is preserved. The 1990-91 report features a special section titled, “The Agony of Choice: Strategies for Preservation and Scholarship.” The last two annual reports have highlighted the “keepers” of knowledge–the library and archival communities.
Developments in technology, the International Project, education and training, archives, the Brittle Books program, scientific research and improvement of materials, and institutional initiatives are also covered. The 53-page report has been distributed to all those on the mailing list. Additional copies are available while supplies last.
Board Receives Review and Assessment Report
The external Review and Assessment Committee presented its report to the Commission at the annual board meeting September 26, 1991. The review and assessment committee, charged by the board in December 1990 with conducting a three- to six-month analysis of the Commission’s past, present and future mission, operated as a consultant to the board. Its charge was as follows:
- Assess the progress in preservation in the nation over the past five years.
- Assess the continuing need for preservation activities:
- Identify the major issues for the future: Which are most tractable? Which are most essential?
- Within this context, review and assess the role of the Commission with particular attention to identifying those areas of preservation in which the Commission can be most effective in promoting the interests of the national library and archival community.
- Recommend directions for future Commission activities.
A cover letter to Billy E. Frye, board chairman, from committee chair David H. Stam, University Librarian, Syracuse University, states, “We have found the task to be both demanding and satisfying and hope that it will prove helpful to the Commission as it pursues its own most demanding tasks. Please note that the organization of the report adheres fairly closely to the outline of your original charge to the Committee, a structure which we found useful in organizing the fairly massive materials which we accumulated.”
In addition to Stam, the committee includes William D. Schaefer, former executive director of the Modern Language Association and executive vice chancellor at the University of California, Los Angeles; L. Yvonne Wulff, Assistant Director for Collection Management, University of Michigan Libraries; and Arthur L. Norberg, Director, Charles Babbage Institute, University of Minnesota.
The board has disseminated the 37-page report to the Commission’s U.S. and Canadian mailing list, together with a letter from the chairman requesting comments from readers. Board member Penny Abell is presenting the report to the National Advisory Council on Preservation at its annual meeting in November 1991, and the board will consider the recommendations and comments at its January 30, 1992 meeting. Additional copies of the report are available from the Commission while supplies last.
Notre Dame Publishes Papers on Medieval Studies Preservation
Preserving Libraries for Medieval Studies–Working Papers from the Colloquium at the University of Notre Dame, March 25-26, 1990 has been issued by the University Libraries, University of Notre Dame. The 68-page publication contains essays contributed to the colloquium, which was supported by the Commission, the Medieval Academy of America, and Notre Dame’s College of Arts and Letters. The event’s purpose was to begin to organize medievalists for a national effort in library preservation. According to the introduction written by colloquium organizer Mark D. Jordan, that purpose is being met, since the final recommendations have led to the forming of a very active national committee for library preservation within the Medieval Academy. The essays, notes Jordan, are worth reading to show “how a group of scholars in a broadly interdisciplinary field began to work out among themselves a practicable program for saving what they count the essential contents of research libraries.” In addition to the essays, the publication includes lists of participants, topics of discussions, and recommendations. The Commission has mailed copies of the report to portions of its mailing list. A limited number of additional copies are available upon request.
Mass Deacidification Update
Library of Congress Turns Down Bids
The Library of Congress (LC), which was seeking an industrial firm to deacidify millions of books, has turned down three vendors’ offers because none could meet all technical and business requirements. LC issued a request for proposal (RFP) in September 1990 after a year of consultation with conservators and preservation scientists from around the world. The RFP contained requirements for toxicological and environmental safety, process efficacy and other preservation needs, and the aesthetic appearance of treated books. LC intends, with Congressional support, to continue its search for a suitable mass deacidification technology. The three vendors have given LC permission to release their test data. More information is available from Gerald Garvey, Preservation Projects Officer, Library of Congress, Washington, DC 20540.
A Book is the only place where you can examine a fragile thought without breaking it , or explore an explosive idea without fear that is will go off in your face.Edward P. Morgan
Intern’s Report on Cooperation Accepted for 1992 Book
Working Together–Case Studies in Cooperative Preservation by Intern Condict Gaye Stevenson has been accepted for publication in Advances in Preservation and Access, Volume 1 (Westport, CT: Meckler Corporation, 1992). Stevenson compiled the report during a three-month internship with the Commission as she was completing her degree at the School of Library and Information Science at the Catholic University of America. The Commission has distributed the report to its sponsors and colleagues. A limited number of additional copies are available.
College Libraries Committee Discusses Future of Management Training, Digital Preservation
A Preservation Management Seminar for College Libraries and a proposed digital technology project provided the major agenda items for the College Libraries Committee at an October 1991 meeting held at Commission headquarters. The first one-week preservation management seminar, developed jointly by the Commission and SOLINET, Inc. (Atlanta, GA), was held July 20-27, 1991 at Washington & Lee University. According to Lisa Fox, SOLINET Preservation Officer, positive evaluations of the seminar were due in large part to the high motivation of attendees and to the ability of the faculty to adjust to learners’ needs. Participants’ awareness of and confidence in institutional support also contributed to a successful event. The committee and SOLINET will conduct follow-up evaluations with participants and library directors in February 1992, focusing on the seminar’s impact on institutions’ preservation programs. The committee agreed to move ahead in planning a second seminar in the northeast in Summer 1992, with tuition held at current rates. Further information on application procedures and location can be obtained from SOLINET, phone 800-999-8558.
Recognizing that there are many issues to address, including selection of materials and copyright, the committee agreed to consider the feasibility of a project to use digital imaging equipment to cooperatively scan out-of-print books needed for current college instructional programs. A benefit of digital scanning equipment is that it can be used by many departments on a campus. By taking advantage of existing equipment and an existing network, a library can gain scanning capability for incremental costs. A technology sub-group will examine the applicability of this approach, to be discussed at the next meeting scheduled for March 20, 1992.
California Actions Advance Statewide Preservation
California has recently issued the proceedings of a March 1991 conference–Toward a California Preservation Program; prepared packages of original software and instructions for institutions to assess their preservation needs; and named a 30-person Preservation Task Force to participate in a three-day retreat in February 1992 to draft a statewide preservation cooperation plan.
The March 1991 conference, which marked the beginning of the statewide preservation program, was attended by 150 librarians, historians and archivists. The state financed the conference proceedings with Library Services and Construction Act Title III funds administered by the state librarian. The state now is distributing new needs assessment software packages that enable institutions to discriminate among competing preservation needs and establish priorities. The upcoming February 1992 retreat will use previous conference outcomes and statistical data to draft a cooperative plan, which the state expects to implement under the emerging California Multitype Library Network. Further information is available from Barbara Will, Networking Coordinator, at the California State Library, Library Development Services, 1001 Sixth Street, Suite 300, Sacramento, CA 95814.
Preprint on Technology & Ethics of Future Preservation
“Mixed Microform and Digital”, a preprint of an article by Rowland C.W. Brown appearing in the October 1991 AIIMINFORM, is available upon request from the Commission. The four-page article, prepared at the request of the Association for Information and Image Management (Silver Spring, MD), discusses ethical and technical considerations of the preservation of knowledge as we enter an environment in which electronic technologies may well dominate traditional print options. “Somehow our acceptance of impermanence in society may have dulled our efforts to seek immortality, to record for all time the fruits of our creativity, our life and times, our accomplishments and beliefs,” Brown writes, suggesting that multiple solutions will be required for the future. Brown is a consultant to the Commission and chair of the Technology Assessment Advisory Committee.
Board Member Avram Honored
Henriette D. Avram, associate librarian for collection services at the Library of Congress (LC) and a Commission board member, retires from LC at the end of December 1991. Ms. Avram has served LC for over 25 years, the last eight as director of the library’s largest service unit. Ms. Avram’s achievements are termed “formidable” by the library profession, and a recent tribute tagged her the “quintessential librarian.” “She leaves a legacy here not only in the structure and standards of our bibliographic operations, but in the many library managers and specialists nurtured under her stewardship,” commented James Billington, Librarian of Congress.
ABOUT YOUR NEWSLETTER SUBSCRIPTION
On August 26, 1991, we mailed a SPECIAL NOTICE to all newsletter subscribers, asking for a reply by October 31, 1991. The notice asked subscribers wishing to continue receiving this complimentary newsletter to return the form, making any necessary address corrections. The notice explained that persons who did not return the form would be deleted from the mailing list.
As of the October 31, 1991, deadline, we have received replies from half of our mailing list of nearly 1,000. For those of you who did not return the form, this is the last newsletter you will receive. (NOTE: Commission sponsors will continue to receive the newsletter, as will persons who were added to the list after August 26, 1991.)
We instituted this “purge” to help us contain our newsletter printing and mailing costs. We appreciate your cooperation in helping us maintain this newsletter as a service of the Commission.
Special Report Paper
Some Thoughts on Paper as an Information Storage Mediumby Dr. Peter G. Sparks, Preservation Consultant*
Paper as we know it was first made around 105 A.D. in China and has been serving very well in many ways for almost 2000 years. Only in the last 150 years or so has its lack of permanence created the challenges of preserving our written and printed intellectual heritage. Under the heading of the “brittle paper problem,” these challenges take up a considerable amount of our time, fiscal resources, and intellectual energy in the search for acceptable solutions. At this time, then, it might be helpful to reflect briefly on how paper has behaved as a medium for the long-term storage of information and what we know about its properties. Perhaps a reminder of what we know about paper can help us choose more wisely where we should be going with the preservation of and access to library materials on paper.
During this century a considerable body of scientific knowledge has come into being that tries to explain how paper ages in a natural and also in an accelerated mode. For example, between 1963 and 1985 there have been at least eighteen research papers published on the subject. Moreover, there are volumes of processing, engineering, and chemical information on how modern paper is made and how its properties relate to various end uses. There are also a number of technical preservation studies done in the last fifteen years that relate to paper preservation. It is not within the scope of this article to review this extensive array of information. Suffice it to say that there is a great deal of technical information about paper that can be used to help us try to understand about its natural and accelerated aging, manufacture, and preservation. If one takes the time to review the results of some of these studies, useful facts turn up that are relevant to the preservation decisions that the field is making. For instance:
- There are many different grades of paper made for printing purposes, and the properties of these papers are different. Furthermore, under certain conditions a paper’s properties can change over time.
- The properties of paper appear to change at different rates for different papers. For example, the aging rate is very dependent on what pulp and additives are used in making the paper.
- In accelerated laboratory aging experiments, acid paper loses about 50% of its strength during the first 10% of its life, and papers that became weak from aging lose their remaining strength very slowly over a real time equivalent of many decades. These weak papers do not fall apart on their own. They must have an external stress applied to them to initiate failure.
- A 1980 study showed there is good correlation between changes in paper properties after 35 years of natural aging and accelerated aging (72 hours at 100°C) done 36 years earlier on the original samples. The best correlation occurred for paper made from pure cellulose raw materials, where the acid decomposition reaction is the most significant process.
- The inevitable destiny of paper made by the “acid process” is acid-induced decomposition, enhanced by oxidative decomposition, autocatalyzed by the presence of trace metals, speeded along by moisture and temperature which change it irreversibly into a physically weak and brittle state.
- An increase in the moisture content of paper (due to a higher relative humidity environment) and in its temperature, increase the aging rate of paper. The effect starts to be measurable in the laboratory at relative humidity values above 40% and temperatures above 50o centigrade (122oF).
- Moisture cycling induces stress relaxation in paper, which can lead to irreversible and deleterious property changes.
- Alkaline papers, machine-made or deacidified, exhibit a pronounced decrease in their rate of decomposition as measured in the laboratory. For deacidified paper the magnitude of this effect is reflected in an estimated increase of 3 to 5 times of their original life expectancy with acid present. Different types of papers behave differently when deacidified, but they all gain some additional lifetime benefit. High-quality machine-made papers that have been deacidified can approach laboratory estimated lifetimes in the 400 to 500-year range.
- Coated papers use alkaline pigments, e.g., calcium carbonate, as a coating color. Although little investigation has been done on the accelerated aging of coated papers, the presence of these pigments should have a positive effect on the papers’ aging stability. A recent study at the University of Pennsylvania Dental School Library, which has many historic volumes printed on coated paper, showed markedly fewer brittle papers in this collection.
- Collection condition surveys done in the last ten years on a number of major library collections show time after time that 25 to 30 percent of the paper in these collections is already brittle and 75 to 70 percent had some strength remaining. Moreover, these data also point out that 95% or more of the paper in these collections is acidic. Although much quoted, these data should not be put aside as old information, since they form the basis for documenting the magnitude of the problems and for making preservation decisions.
What do we know from the real time observations that we make in our own collections of books and documents? In addition, what real time observations have been made outside our libraries that can give us a picture of the long term stability or instability of paper? These data are very important because they represent a measurement of how paper has changed over a known amount of time.
We look with wonder at the 15th, 16th, 17th and 18th century handmade papers used in the rare books in our collections. For those fortunate enough to have very early Chinese materials, seeing a paper that is approaching the age of 1000 years is a reassuring experience that these early papers, when properly cared for, are very stable materials. Similar observations can be made in any major library in most parts of the world, and there is no doubt that early hand-made papers have been and still are an archival medium.
On the other hand, papers made in the mid to late 19th century are, with few exceptions, badly degraded to a weakened and brittle condition. This of course is no big secret to any librarian or preservation officer. It is, however, overwhelming proof that the “inevitable destiny” of any paper made by the acid process is to eventually become brittle.
Observations of historic coated papers are very interesting. Many plates in late 19th century books are coated papers. I have seen a number of plates in perfectly fine condition sandwiched between weak and acidic pages in the same book. The coated paper appears to have held up well under rather adverse conditions for 75 to 100 years.
We have all seen brittle book papers that begin to fall apart during normal library use. The first locus of breaking is usually on corners and along the book spine where the paper can be easily bent through a small radius. If a piece of brittle paper is handled very carefully or not at all, it does not break and will tend to remain in one piece unless put under an externally applied bending or tensile stress. When a brittle sheet is put in a Mylar folder or Mylar encapsulated, for example, the paper is prevented from bending through a small radius by the supporting Mylar film and this allows the paper to be handled without breaking.
A machine-made alkaline paper produced in 1901 by Edwin Sutermeister at the S.D. Warren Paper Company has been under continuous observation since then and is reported to be holding up very well. This is perhaps the oldest piece of machine-made alkaline paper whose condition has been documented at frequent intervals.
Where does all this leave us with respect to paper and its future fate as a medium for storing information? How can this type of information be interpreted for use in shaping decisions? Here are a few examples:
- Given the observations in the laboratory, from a materials behavior standpoint, brittle book paper (1 fold or less) will not fall apart if the volume is left unused on the shelf year after year. The minute changes in paper strength will be hard to measure in the laboratory and undetectable to the touch. On the other hand, that same brittle paper will start to break up during normal library usage, handling and processing.
- Large diversified research collections contain a broad range of paper types from different time periods and from all the corners of the world. Research tells us that the aging behavior of these papers will be different. Therefore, the condition of these papers at any given time can be widely different. As a result, deacidification of a diverse collection will probably not impart uniform stability and similar extended lifetimes to all grades of paper represented in that collection. Deacidification can, however, increase the stability and lifetime to a varying degree of all the papers in that collection that are not brittle. Lastly, deacidification will probably not make all papers in that collection “archival” preservation media.
- Handmade papers from many cultures and earlier periods have exhibited excellent archival preservation behavior over many centuries and will continue to do so if we can continue to shield them from specific physical, chemical and biological dangers. It is also probably true that machine-made papers made with high quality cotton pulp, non-acid sizing systems and loaded with 1 to 2% of a slightly alkaline filler will behave as an archival material. These stable papers will probably not need to be reformatted in order to create a preservation master copy. However, wide access to these materials may require a new format for distribution to other parties.
I will leave it to the readers of this article to further weigh how the modest sample of technical information presented above can help them in their own programs. We are fortunate to have a wealth of technical information on paper–perhaps a better data set than we have on any other medium–and we can use this information to define the role of paper in the library and archive collections of the future. We also need a continuing effort to carefully document past and recent scientific findings about paper behavior so the library and archive preservation field can use those findings in making informed decisions on how and when to retain and preserve or replace the paper in our current collections.
*[The author served as Director for Preservation at the Library of Congress for eight years before becoming a consultant in 1989. His education is in the physical chemistry of polymers.]
SPECIAL REPORT – DIGITAL
Excerpted from an article to appear in the 1992 Advances in Preservation and Access, vol.1, published by Meckler. For a discussion of terms associated with the technologies of document preservation, see Preservation and Access Technology, The Relationship Between Digital and Other Media Processes: A Structured Glossary of Technical Terms, by M. Stuart Lynn and the Technology Assessment Advisory Committee.
Update on Digital Techniquesby Anne R. Kenney and Lynne K. Personius
Cornell University and Xerox Corporation, with the support of the Commission on Preservation and Access, have been collaborating in a project to test a prototype system for recording deteriorating books as digital images and producing, on demand, high quality and archivally sound paper facsimiles. The project goes beyond that, however, to investigate some of the issues surrounding scanning, storing, retrieving, and providing access to digital images in a network environment.
The project has involved the collaborative efforts of two Cornell divisions, the University Library and Cornell Information Technologies (CIT). It is co-managed by the Assistant Director, Department of Preservation and Conservation, and the Assistant Director of CIT for Scholarly Information Sources (the authors of this article). While the two divisions have worked closely in the past, most notably in the conversion to an on-line catalog, this co-sponsorship serves as a model for future projects involving the library and information technology organizations in the use and control of electronic technologies. Within Xerox Corporation, the College Library Access and Storage System (CLASS) Project has been assigned to a group of engineers, with liaisons from marketing, system support, networking, and other projects. Representatives from these units participate in project development team meetings where the management from Cornell and Xerox discuss problems and possible solutions, share information, and chart future directions. The site for these meetings alternates between Xerox headquarters in Rochester and Cornell University in Ithaca.
This collaborative relationship has resulted in the development of workstation hardware and software specifically designed for use in a scanning environment where high speed, high resolution scanners are controlled by technicians. In the course of this project, which runs through December 1991, Cornell is scanning at 600 dots per inch resolution (dpi) 1,000 brittle volumes at a workstation located in Olin Library, the main graduate library. Scanned images are being created as TIFF images and compressed prior to storage using Group 4 CCITT compression.1
While this project is still in an experimental stage, and the initial costs incurred with “ramping up” for production are high, some preliminary findings of the Cornell/Xerox Project suggest that the use of scanning technology represents an affordable alternative to microfilming for reformatting brittle material. The time spent in actual scanning is comparable with microfilming production rates if all of the post-processing testing and quality control required of microfilming service bureaus are taken into account. In the Cornell/Xerox Project, scanning rates over 1500 images per day have been attained for sustained periods as long as three weeks or more. However, since the project is developmental, and production is frequently interrupted for software and hardware upgrades and testing (not to mention visitors!), production measurements for longer time periods have not been possible. These scan rates include time spent in initial setup and on-screen inspection, scanning, storing to optical, rescanning, and transmittal for printing.2
At this stage, if one were to compare the actual and projected costs associated with producing microfilm via production scanning to costs associated with service bureau microfilming, the two processes are competitive. The production-related costs of scanning to produce microfilm include labor, equipment, overhead, and conversion to microfilm. Today, these costs are comparable to, and within two years are projected to be less than, the costs of using conventional microfilming methodologies.
It appears that the cost feasibility of using digital technology for preservation reformatting is present today. In addition to fulfilling a preservation need by the creation of microfilm, digital technology offers value-added access and distribution benefits. The possibilities exist for converting microfilm to digital imagery as well, and Yale University has prepared a report on the feasibility of a project to study the means, costs, and benefits of converting large quantities of preserved library materials from microfilm to digital images.3
If one were to look at the costs of producing both microfilm and digital masters, the costs would include initial capture, conversion (to film in one instance, and to digital in the other), plus costs of storage and refreshing of the digital masters to keep them compatible with upgrades in the technology (which are identical for the two processes). The costs for scanning first versus microfilming first appear to favor the former. Over time the gap will continue to widen, with scanning firstbecoming significantly cheaper due to the anticipated decline in costs associated with the use of electronic technologies as compared with photographic technologies over the next decade.4 Cornell is preparing a detailed cost study on the use of digital technology that will be made available at the end of this project through the Commission on Preservation and Access.
If the cost associated with producing both microfilm (for preservation) and digital files (for access) were no longer a factor in determining which method of capture to use, the next concern to consider is quality. While newer continuous tone films may soon be available, the high contrast film currently used is not totally acceptable for reformatting a large percentage of illustrated material. With digital technology it is possible to capture halftone images as gray scale and surrounding text as high contrast black and white. Thus it may prove to be the case that capturing an image digitally first will result in a higher quality microfilm copy–or hard copy replacement or on-screen representation–than is currently achievable with conventional microfilming. Clearly this issue warrants further investigation. A lot of brittle material has already been microfilmed and obviously it is desirable to scan and digitize some of it for access purposes. Yale University’s proposed project to convert large quantities of preserved library materials from microfilm to digital images will provide valuable data on the means, costs, and benefits involved. The issue of quality should be studied carefully. However, as will be discussed below, much work needs to be accomplished before scanning technology can become a true alternative to microfilming.
The digital files that are created at the scanning workstation in Olin Library may be viewed on screen during the initial set up for each book using the interface delivered as part of the Xerox product. The scanned images are then transmitted over the Cornell TCP/IP network for printing at a Xerox Docutech printer located in the computer center one half mile away. This recently released Xerox product prints 600 dpi pages from scanned images at the speed of 135 pages/minute.5
In the Cornell/Xerox Project, a primary goal is to evaluate the paper output from the Docutech. Copies are being made for each of the 1,000 books included in the project. The quality of the paper copy is extremely high: there is less than 1% variation in print size from the original; skew results only when the edge of the original text is not parallel to the page trim; front to back registration is reproduced within 1/100th of an inch of the original; the contrast between text and background is sharp; and the 600 dpi resolution compares favorably with the capture capabilities of photocopy. While lower resolution scanning devices can produce satisfactory copies from crisp, high contrast modern documents, many of the 1,000 deteriorating volumes in this project contain irregular features typical of the production typography and printing techniques of the past century and a half. The 600 dpi copies successfully capture these printing nuances to represent faithful and legible reproductions of the originals. As paper copies are printed on permanent/durable paper that meets the ANSI standards for permanence, and the Docutech printer meets the machine and toner requirements for proper adhesion of print to page, the paper product is considered to be the archival equivalent of preservation photocopy.6
Microfilm also can be reproduced directly from the digital files. The Cornell/Xerox Project has produced some microfilm on a test basis with a company that has developed the capability of transferring high resolution digital images and gray scale into digital microfilm output. That company’s sample microfilm output from 600 dpi images has a very high resolution, and the image is crisp with sharp contrast between text and background. Cornell will use microfilm output as its primary backup for the digital files, and as the preservation copy of the originals to meet national standards.
Xerox has announced its intention to enable Docutech to receive documents for printing directly from remote electronic devices. As of this writing (summer 1991), however, Cornell has the only networked version of the Docutech outside of Xerox. This configuration allows Cornell not only to separate physically the scanning function from printing but also to store the digital files in an image management system for subsequent use and dissemination. The digital files are being cataloged in both the Cornell on-line catalog (NOTIS) and the Research Libraries Group, Inc. database (RLIN). These records will provide researchers around the country with their initial access point into this developing digital library.
During this last year, Xerox, with the involvement of Cornell, has designed and is in the process of developing the architecture that provides the means for creating, organizing, storing, printing, and accessing digital images in a network environment. The CLASS system is composed of a software application that controls the scanning workstation, a flexible document structure architecture, a storage system, and a user interface designed for the public, all connected by a network to the DocuTech printer.
In the Autumn of 1991, the digital images will be transmitted to an image storage facility, consisting of an image server and an optical jukebox, also located in the computer center. The digital images will be stored on 12 inch optical platters and ultimately made accessible over the Cornell network via a request server that is in the final stages of development. Within the scope of this current project, researchers will be able to generate a print on demand request for a book or any portion of a book via this request server.
Critical to all of the system design and networking configuration is the document structure information. Xerox has produced detailed specifications for the software and database to implement the document structure architecture that are described in internal Xerox project reports. The document structure defines the organization of each book. It orders the individual images captured during the scanning process into a logical arrangement for presentation to the user. The “official” document structure is built at the time of scanning by the technicians. This document structure will describe the original text as accurately as possible, and will be stored with each digital book.
The overriding principle that guides the definition of the official document structure is that the correlation between images, and the page numbers printed on the originals, will be retained. This ensures that a request to view a particular page number from the text recalls the image with that number printed on it. Once this has been done, all of the self-referencing components of the original can take on real meaning in the digital version.
The second principle is that the user must have easy access to those self-referencing portions of the original. The table of contents, the index, list of illustrations or other pages that provide references within the text should be tabbed for easy use. Once these pages have been recalled on the computer screen, the user will be able to request specific pages or page ranges within the book. For instance, a person viewing the image of the table of contents can create a request for a chapter or a set of chapters to be located for on-screen viewing or printing.
The creation of the official document structure file for a given book should be kept as simple and straightforward as possible. The two principles, to provide retrieval by original page numbers and access to self-referencing sections from the original, are relatively easy to implement by the scanning technicians. More complete indexing would require additional time and a higher level of subject expertise, thus increasing the cost of initial capture.
In the digital environment, images may be viewed as part of more than one structure. Within the preservation context, there are at least two applications of this capability. First, a new document structure will be used to combine the text of books with microfilm targets that have been scanned in preparation for the creation of a microfilm copy from the digital files. Second, the digital files for a volume with damaged or missing material may be combined with images for substitute pages located elsewhere.
However, it is in the area of use that personalized document structures will prove most valuable. Ultimately, for example, anthologies or reserve reading packets can be assembled and annotated by defining a new document structure record describing text files and digital images that were originally part of several books. In fact, these originals may be located at different institutions. The ability to produce customized documents offers scholars new opportunities for research and publication but will pose challenges in the area of copyright and authenticity.
Another component of the Cornell/Xerox project is the development of a public viewstation. A prototype has been installed in the Mathematics Library where users can access the full digital images for fifty mathematics monographs.7 Xerox has provided the workstation and the software system, with input into the design by a committee of Cornell librarians and computer professionals, that is tailored to the library patron who is browsing the collections to chose material for use. From that workstation, researchers can select a book for review from a list of search results displayed in one window. A book icon is then moved to a second window known as the personal book shelf. After choices are made, opening the icon causes the actual pages of the book to be displayed on a high resolution screen. Screen output is delivered as 200 dpi images that are derived from the 600 dpi ones and resized to fit two pages to an 11 X 14 inch screen. Images can be enlarged for reading fine text; pages can be selected for viewing; tabs can be placed in the book for ease of movement through the text; and a request to print all or part of the book can be issued by the user.
The quality of the on-screen image is quite acceptable, principally as it is anticipated that screen viewing will be used for rapid browsing and retrieval. For extended reading, a user will soon be able to initiate a print-on-demand request of the 600 dpi digital images. This workstation represents the first step in providing a level of browsing and retrieval that approximates looking through books in library stacks. In future projects, Cornell will develop an image conversion server that will enable readers from around the country to access the digital images using common computer platforms, such as Apple MacIntosh, IBM PS/2, and Sun workstations.
Remote access to digital images presumes a national networking infrastructure that can accommodate the transmission of massive data at high speeds. The files for digital images are large. An 8 and 1/2 X 11″ page stored at 200 dpi resolution may be as large as 4 megabytes in its uncompressed form, compared to the file for an alpha-numeric representation of the page (averaging 3,000-5,000 bytes per page).8 Even though digital files may be compressed for storage and transmission economy, the resulting compressed images are still quite large. In the Cornell/Xerox project, the compression ratio is 15 to 1 for textual materials, with compressed image files averaging from 60,000 to 80,000 bytes per page. Transmitting a significant number of digital images would overwhelm moderate to low capacity networks.
The National Research and Education Network (NREN), currently pending funding in Congress, will consolidate the collection of TCP/IP networks now known as the Internet into one high speed, high capacity system. It is predicted that the increasing capacity of such a national network will keep pace with the demand for the timely transmission of an increasing volume of large digital files.9 Michael M. Roberts, in the Summer 1991 issue of EDUCOM Review, suggests that advances in the semi-inductor and fiber optics industries have resulted in a communications revolution that offers networking speed and capacity at costs that are reasonable. He goes on to modify a quotation by Gordon Moore, chairman of Intel Corporation, who said “Make your plans on the assumption MIPS are free.” According to Roberts, “Today an updated prediction would be, ‘Make your plans on the assumption BITS are free.'”10
The growth of networking in the United Stated during the 1980s has resulted in connections between many universities as well as government and industrial partners. The digital library of image information will be available to people in any of these locations. Estimates for distribution of network access by the end of 1991 include over 1,000 sites serving 2 to 4 million people involved in research or education. NREN will triple the number of sites, reaching all states and territories by 1995.11It appears that the scanning and digitizing of deteriorating library material and the establishment of large capacity networks could coincide to produce a truly national digital library.
1. Digital image technology, for the purposes of this article, is defined as the electronic encoding of scanned documents in digital image form. The text contained in these images is not converted (for textual interpretation or indexing purposes) to alphanumeric form at the time of scanning, although the potential exists for such conversion, in whole or part, from the digital files at some later time. The present capabilities of optical character recognition are inadequate for capturing both the information and the presentation of the original document, especially when one considers the vast number of languages, illustrations, type faces, and printing techniques present in the collections of modern research libraries. See Stephen Smith and Craig Stanfill, “An Analysis of the Effects of Data Corruption on Text Retrieval Performance,” (Thinking Machines Corporation, Cambridge, MA: December 14, 1988).
2. In a recent three week period, the average daily scan rate was 1548, which included initial setup and on-screen inspection, scanning, storing, rescanning, and transmittal for printing. This rate was achieved despite down time for system and network failures associated with the use of prototype equipment and time spent in demonstrations for visitors. Estimates for one image per frame filming range as high as 2,000 per shift, but microfilming service bureaus are also responsible for density and resolution tests, visual film inspection, the preparation of three generations of film, and box labeling. The time to complete these additional tasks should be considered in calculating the microfilming production rate. Phone conversations, Anne R. Kenney with Shawne Diaz Cressman, Shift Supervisor, MAPS, The MicrogrAphic Preservation Service, and with Fred Keib, Manager, Cornell Photographic Services, August 28, 1991.
The potential for containing labor costs exists on a number of fronts with digital technology. For example, selection and preparation time often represents a large percentage of the expense in reformatting. In a microfilm project, material is normally inspected for completeness prior to filming and after filming as well, because the number of retakes on a role of film is strictly limited. Digital technology is infinitely flexible: a page may be inserted or replaced with ease. Thus in a scanning project, material need be inspected just once, at the end of the process, rather than twice. Similarly, items that are missing pages may be scanned at any time, and the missing pages inserted as they are located. The major labor costs, that associated with the scanning of images, could also decline as institutions move from prototype to production operations and as improvements in automatic and semi-automatic feed mechanisms eliminate the risk of a paper jam, as automatic skew detection becomes standard, and as bound volume scanners are developed.
3. Donald J. Waters, From Microfilm to Digital Imagery. On the feasibility of a project to study the means, costs and benefits of converting large quantities of preserved library materials from microfilm to digital images (Washington: The Commission on Preservation and Access, 1991). Michael Lesk has argued that the cost of creating digital images from microfilm will be cheaper than scanning. See Michael Lesk, Image Formats For Preservation And Access. A Report of the Technology Assessment Advisory Committee to the Commission on Preservation and Access(Washington, Commission on Preservation and Access, 1990), p. 8.
4. Cost analysis was also a part of the National Library of Medicine study, where costs for scanning and related activities ranged between 12 and 28 cents per page. See volume 1, pages 11-15 in Document Preservation by Electronic Imaging. See also National Archives and Records Administration, Optical Digital Image Storage System, Project Report, March 1991(Washington: National Archives and Records Administration, 1991) p. 19-24. The latter report concluded that scanning could not be justified purely on the basis of cost alone. Intangible benefits were cited, such as improved image legibility, improved timeliness and accuracy of access, enhanced retrieval, reduction of space requirements, and reduced or eliminated handling of original documents.
5. Barnaby J. Feder, “A Copier That Does a Lot More,” The New York Times, Wednesday, October 3, 1990, D1.
6. Norvell M.M. Jones, Archival Copies of Thermofax, Verifax, and Other Unstable Records. National Archives Technical Information Paper No. 5. (Washington: National Archives and Records Administration, 1990). ANSI Standard Z39.48-1984, currently being revised, covers the requirements for permanent/durable paper. The Cornell/Xerox Project compared paper output from digital files that were scanned and printed at 300 dpi to those scanned and printed at 600 dpi. The 300 dpi images were found to be unacceptable for replacing (as opposed to providing surrogates for) deteriorating originals.
7. The first 500 of the 1,000 volumes chosen for the Cornell/Xerox Project were selected from the Mathematics Library and include works of significant authors and individual titles that have contributed substantially to the development of the discipline.
8. Anderson, Mitchell, Pennebaker, and Gonzales, “Image Compression Algorithms,” (Paper delivered at the International Electronic Imaging Exposition & Conference, Boston, MA., October 3, 1988), pp. 398-401
9. Kenneth M. King, “Progress in Building a National Information Infrastructure,” EDUCOM Review 26, No. 2 (Summer 1991): 63-64. The Coalition for Networked Information (CNI) was formed by the Association of Research Libraries, CAUSE, and EDUCOM in March 1990 to explore the promise of high performance computers and advanced networks for enriching scholarship and enhancing intellectual productivity. It has become a recognized force in the evolution of policies and practices which will govern the networked research and education information environment.
10. Michael M. Roberts, “Positioning the National Research and Education Network,” EDUCOM Review 26, No. 2 (Summer 1991): 12
11. Paul Evan Peters, “Connectivity,” (presentation to the Bentley Mellon Fellow Seminar on the Impact of Technology on the Research Process: Archives in the Year 2000, Ann Arbor, July 18, 1991).