Council on Library and Information Resources

Username (email)

Password

The Data--Digital Collections Inventory

The Data--Digital Collections Inventory

The Commission on Preservation and Access

Council on Library Resources
Commission on Preservation and Access

Digital Collections Inventory Report

By Patricia A. McClung
February 1996


The Data

As expected, the investigations turned up a hodgepodge of responses. There are innumerable projects which feature pictorial images (e.g. photograph collections, maps, drawings of some sort, or museum collections); there are documentary text editing projects for individual personal papers; there are literary and historical text encoding projects (which for the most part feature SGML encoding); there are efforts to convert entire collections or to provide a critical mass of materials in a particular subject area; and there are a wide variety of experimental projects of one flavor or another. In addition to projects which convert print-based and/or photographic materials, there are a host of mixed-media projects, as well as projects focused on additional formats such as sound recordings, films, microfilm, motion picture film, etc. There are also a number of initiatives to make materials whose original format is electronic widely available via the Internet.

Many of these projects are clearly experimental in nature; experimentation is essential in these early stages of the development of an electronic information environment. A number of other projects seem to be undertaken because of a widespread feeling that it is important to have digital projects underway in order to be current with the trends; but even these help to increase the knowledge base in the library and academic community--something which is also extremely important to the transition. Taken altogether the inventory highlights clusters of initiatives and indicates that a critical mass of on line resources is evolving--although not yet with any apparent coherence or logic.

In the course of the survey, it was pointed out more than once that a clearer definition of what was meant by 'scanned collections' was needed to help the potential respondents understand what the inventory would and would not include. While the focus of the investigation was to compile an inventory of retrospective library and archive collections that have been (or will soon be) converted to electronic form for networked distribution, the actual situation in cyberspace is that there are no clear demarcations or obvious definitions for distinguishing types of electronic resources and tools. In fact, there are many different perceptions and working definitions of what actually constitutes a scanning, a digitization, and/or an electronic conversion project. If the survey is an accurate indicator, most people use these terms somewhat interchangeably to refer to a variety of initiatives that ultimately result in electronic availability of retrospective materials converted from other formats. The reason is that although the term 'scanning' usually refers to the capture of digital page images (also referred to as bitmapped or raster images), this form of digital capture is sometimes followed by optical character recognition (OCR) conversion to fully searchable text--which then might be encoded using SGML, HTML or some other markup language.

In fact, there are very few "scanning-only" projects, other than for pictorial types of images. Furthermore, it is obvious from this survey that no two projects are exactly alike. Technical decisions are governed by many factors: the available hardware, software, and expertise; the nature and formats of the materials themselves; the anticipated use; and the budget. That said, the conversion projects usually fall into one of these general technical categories:

  1. Digital image projects that take an "electronic photograph" of pages, graphics, prints, photographic materials, maps, or whatever (with accompanying metadata for describing, structuring, and indexing the image database).2
  2. Digital imaging projects with additional text-searchable files generated from the images (these can be uncorrected text files used for indexing purposes).
  3. Full-text conversion projects involving either keyboard entry or Optical Scanning Recognition (OCR);3
  4. Text encoding projects using SGML or HTML, or some other mark up language.4

This survey report focuses primarily on the first two categories, both of which involve the "electronic photograph" capture method--often referred to as scanning. This method is used most often in the retrospective conversion of traditional source material, because it "can accurately render the information, layout, and presentation of original source documents." The third and fourth categories produce electronic text--that is, data that can be manipulated for searching and indexing purposes.5

While there is still considerable discussion in the field about which of these four options is appropriate--and when, the users of these electronic documents are beginning to speak up. They want to download and manipulate the text and images--not just look at a book page on a computer screen. Some projects are providing both a digital image and an electronic version of textual materials. However, the economic factors rear their ugly heads at this point, because fully searchable texts are far more expensive and time consuming to produce than bitmapped page image scans.6

[Previous] [Top] [Next]

Updated: