European Commission on Preservation and Access, Amsterdam October 1997
Digitization as a Means of Preservation?
European Commission on Preservation and Access, Amsterdam
October 1997
5. Digitizing from the original
5.2 Criteria for the choice of system
5.3 Storage format
5.5 Requirements for image viewing software
5.6 Requirements for image viewing hardware
5.7 Migration
5.9 Differing recommendations on color images
5.1 Quality requirements
In the current state of technology, digitizing from the original gives a better reproduction quality for color material and material with weak contrasts than digitizing from film. When endangered original material is digitized, the converted form acquires the status of a preservation master which, in an extreme case, will have to serve as a substitute for the lost original. In this case, of course, the reproduction quality must be higher than is necessary in cases where the digitized secondary form exists only to improve access. A later, repeated digitization of the endangered original, even if possible, is not consistent with the aim of preservation. This means that the first digitization must be of the highest possible standard.
It follows that, in applying the quality index (see paragraph 3.1), the highest quality (qi = 8) must be guaranteed. To reproduce the small “e” with a height of 1 mm at higher quality, bitonal digitization by that formula requires a resolution of 615 dpi (410 dpi for 256 gray scale).
A resolution of at least 600 dpi is recommended for bitonal digitization of printed text that includes line drawings. A resolution of 400 dpi is generally adequate for bitonal digitization with texts that are clear, larger, and, in particular, evenly spaced (10 point and above), and that have been produced by modern, non-impact typewriters, such as plastic carbon band, or by ink-jet or laser printer. Two-hundred-fifty-six gray scale and a resolution of 400 dpi should be used for the following: manuscripts, drawings with pencil or crayon, typescript with silk ribbons, color illustrations and other drawings with varying gray shades, and black-and-white and color photographs. These recommendations also correspond to American quality requirements for digitizing original material. The suggestions on filming technique in paragraph 2.2 and on film organization and documentation in paragraph 2.3 can contribute usefully to digitization and to the further processing of the digitized conversion form.
5.2 Criteria for the choice of system
Scanners that work like a planetary camera, digitizing the material from above, must always be used for sewn and bound volumes. Feeder scanners and flatbed scanners are not suitable for books and archives. It is especially important to follow the precautions described in paragraph 2.4 for the protection of books and volumes. Equipment of this kind is indispensable for the digitization of unique material that is fragile.
5.3 Storage format
The comments in paragraph 3.2 are applicable here. If long-term storage of perhaps damaged original material is to be exclusively in digital form, and if, consequently, the digital data carrier deteriorates and there is no microform to fall back on, additional quality tests are necessary for the storage of digitized image data on optical disk. The following procedure is suggested:
First, digitized copies of the material are written to optical storage disks (the primary data holder). The data on the server’s internal magnetic disk are not deleted but kept unaltered. After the image data have been stored as pages in tiff data files in the primary data carrier, they are read back and a few of them are decompressed. The uncompressed or decompressed digital copy has a precisely defined number of image points, which can be calculated with reference to the format of the original material and the resolution chosen for the scanning. This size of the decompressed digital image (in Kb) is the product of the image-point number and the “bit-depth” with which each image-point is represented. A digital copy is thus correctly reproduced when its actual size equals the original value. This makes clear that the transferred copies have been securely stored in their correctly reproducible form. In the extremely rare cases where a digital copy cannot be perfectly reproduced in this test, the logical step is to erase it in the optical data carrier and immediately store it again.
The primary data carrier, created and quality-checked in this way, is the source of copies for data preservation. These working duplicates are for day-to-day use, while the primary data carrier remains the preservation master. If need be, it serves for production of further duplicates. It is not absolutely necessary to subject the working duplicates to the same quality test as the primary data carrier. If, in the course of normal use, it becomes apparent that individual copies can not be reproduced correctly, it is always possible to produce another duplicate, or go back to the primary data carrier for a further working duplicate.
5.4 Format and compression
As for paragraph 3.3
5.5 Requirements for image viewing software
As for paragraph 3.4
5.6 Requirements for image viewing hardware
As for paragraph 3.5
5.7 Migration
Organizational and technical measures are always advisable in the migration of digital conversion forms, to safeguard the transferral of information and for reasons of economy. However, they become indispensable when the digital form is the only form in addition to the original, or when it is expected that it will eventually replace the original. Repeated digitization of the original should be avoided on grounds of preservation and because it would be prohibitively expensive.
The organizational and technical measures for the safe migration of digital conversion forms must be included from the outset in planning, which must take account of the necessary resources. The recommendations in paragraph 3.6 apply to the planning and carrying out of migration, especially the requirement continually to adapt the lossless compressed or, as necessary, uncompressed data to new system environments, and to safeguard adequately the data carrier that is created in each case.
5.8 Financial viability
Where books or archival documents are to be digitized as a whole, this should be done by commercial firms. Where only certain pages, or parts of a document, are to be digitized, this can be done by the institution itself. The cost of digitizing books and documents (page size up to A4) depends on the amount of material, the mode (bitonal or gray scale), and the resolution, but also on the contrast values of the material, its type, and the way in which it is arranged. Simple, flat work, such as single sheets, can be more efficiently digitized with flatbed or feeder scanners than books or other bound volumes, for which special book scanners need to be installed.
When working out the cost of digitization from the original, it is essential to include in the calculation the further cost of migration. In particular, it will almost invariably prove financially more advantageous, when working with threatened originals, first to make a film and then to digitize from that, thus solving the problem of migration. In exceptional cases, with difficult material, it can be advisable, in the interest of reproduction quality, to film and digitize in parallel from the original at the same time. Paragraph 3.7 is relevant on other points.
5.9 Differing recommendations on color images
With current technology, digitization of color can be done only at relatively low resolution values, or for limited quantities of material, because very large quantities of data are involved. Test runs should always be carried out to establish whether the reproduction quality is acceptable.
In the interest of economical storage and processing of image data, compression processes play an even larger role in color digitization than in bitonal or gray scale digitization. At present, there is no compression process that does not involve a worsening of reproduction quality, in particular the distortion of color values.