The members of the Five Colleges of Massachusetts-Amherst, Hampshire, Mount Holyoke, and Smith Colleges, and the University of Massachusetts at Amherst-are developing a depository library that will house collections from each of their holdings. They are also creating a cooperative collection-development strategy that will take full advantage of the depository library.26
In deciding to take part in this joint project, each of the libraries faced the following constraints:
- a critical shortage of space to house growing collections
- a reluctance of its governing body to build separate storage facilities
- the need to implement the most cost-effective strategies for meeting users’ needs
The colleges’ effort to collaborate on collection development and management, as well as storage, can serve as an exemplar of the strengths and weaknesses of collaborative approaches to problem solving.
The cooperative depository will be built at the Amherst Library Depository facility, a former military bunker that was designed to survive a nuclear attack and to serve as a center for military operations in the event of a war. The facility was decommissioned long ago, and Amherst College purchased the bunker from its second owner, the Federal Reserve. The college retrofitted the bunker for secondary storage and began using it in 1995.
The Five Colleges, in the words of the director of one of their libraries, “having taken successful consortial approaches to ordering, cataloging, circulation, subscription database management, and materials delivery in earlier collaborative efforts, explored ways to extend their cooperation to the growing problem of finding additional space to store little-used books in their collections.” When the libraries first talked about cooperating to solve their storage needs, they considered several options, from building a new facility to renting space from a local commercial storage facility or from the nearest library storage facility (in this case, Harvard University’s facility, which is about 100 miles away). Careful cost estimates led the library directors to recommend developing something more radical and more efficient-a shared library depository at Amherst.
In 1999, the presidents of the Five Colleges approved a plan to operate a consortial library depository that would have a distinctive mission and governance, and they assigned responsibility to an existing consortium, Five Colleges, Inc., to run the depository. There would be a consortial ownership of materials sent to the depository (except for the holdings of the University of Massachusetts at Amherst Library; as noted below), and duplicates would be deaccessioned. Materials would be housed by size, and the stacks would not be open for browsing. The existing Five College online library catalogs would be updated to show the new locations of materials transferred to and retained by the depository. Materials deposited by the four colleges would become the property of the Five-College Library Depository. Because the University of Massachusetts, a public research institution, is required to retain ownership of its materials, these would be shelved separately at the depository.
What obstacles did the colleges face in launching this collaborative endeavor? The first was to overcome the reluctance of some librarians and faculty members to transfer any materials off site. This opposition was not unexpected; nonetheless, in many respects, the colleges’ options were limited. Their collections were growing, but budgets were tight. None of the institutions’ governing bodies was willing to authorize funds for capital expenditures to house their libraries on site. Thus, the real issue was how the librarians and faculty could work together to make the transition equitably and with minimal disruption. In the case of Amherst College, for example, deciding which materials to move to the depository required extensive collaboration between librarians and the faculty. When conflicts arose among faculty members-usually over materials of an interdisciplinary nature-the librarians negotiated a settlement.
Another obstacle was that of ownership. Among the first decisions was to move to storage the journals that are available online, such as those mounted on JSTOR. By moving sets off site, the Five Colleges would be able to create a complete run of any given journal. By relinquishing ownership of that journal to Five Colleges, Inc., they would be agreeing to share ownership of and responsibility for common assets. Doing this might be easier for liberal arts colleges than for large private research institutions, which feel a need to keep ownership of library materials. However, the arrangement worked out by Five Colleges was flexible enough to accommodate the specific ownership needs of a public university.
A series of logistical concerns had to be addressed, from document delivery to service on weekends and holidays. There were staffing and funding issues as well. As the collaborative effort moved forward, the colleges found it easier to solve their problems by pooling resources. For some issues, such as staffing needs, the flexibility afforded by a large body would prove to be essential.
The consortium established a Collection Management Committee to examine an array of matters, such as how to decide on acquisitions, what to do if one library decided to cancel a subscription that it alone had, and what materials should receive priority for deposit. The committee conducted research in most academic disciplines on the five campuses to determine the access needs and preferences of their faculties. It developed strategies to anticipate the recall of collections in cases of changing teaching demands. Finally, it determined that the long-term goal of the depository library would be to develop “last-resort” collections of best copies, so that no user would be without recourse to a hard copy of a title, yet the libraries would not be burdened with excessive duplication. The Librarians’ Council, which comprises the head librarian of each college and the Five Colleges, Inc., coordinator, makes decisions about the depository on the basis of the committee’s recommendations.
What can this example of cooperative collecting and storing offer to other libraries? First, libraries of all sizes have an increasing need for off-site storage. Large libraries have been moving collections off site for two decades, and several have built facilities that serve their collections alone.27 (Some of these institutions-Harvard University is one-are also willing to lease some of their underused storage space to other libraries.) Many facilities have been built to serve several institutions in a region. Nevertheless, only one facility, the Midwest Inter-Library Center (renamed the Center for Research Libraries), was founded to manage cooperative acquisitions and preserva- tion programs as well as to serve cooperative storage needs. The Center for Research Libraries has evolved from a model intended to serve the collecting needs of Midwest libraries by focusing on rare and little-used materials to its present profile, in which it serves 221 university and research libraries in North America and provides them with access to heavily used foreign language materials, newspapers, and documentary series.
The advent of digital technology and document delivery by fax have made the environment for collective storage far different today than it was in the 1960s and 1970s. No longer do scholars need fear that materials moved off site will be lost for casual use. Turnaround time at the Amherst College Library Depository is usually less than 24 hours, and an increasing number of patrons are satisfied with desktop delivery. As long as the concept of secondary storage has been accepted on campus and users are satisfied with the levels of service provided, library managers can focus on how to use the shared repository to better serve the needs of both users and collections.
Whereas regionally based depository libraries have been seen chiefly as a way to handle a space problem, the opportunity now is to achieve economies of scale for a number of collection-management tasks, from acquisitions to preservation. Willis Bridegam, director of the Amherst College Library, states that even for the Five Colleges, the chief incentive to cooperate was to solve the problem of space shortages.
They understood the economies that might be realized through joint staffing of a shared off-site library storage center. They saw the potential advantage of being able to develop complete periodical backruns from fragmented sets of the five individual libraries. They supported the idea of choosing the best copy of a book or periodical volume of which there were duplicates for retention in a depository. They also thought that it would be efficient to establish one conservation service at the bunker for all the materials transferred there. Most of all, they were interested in relieving the shelving space pressures in their libraries, and they thought that a joint approach might be more likely to attract external and internal funding (Bridegam 2001, 17).
Reaching agreement about the shared storage site, in other words, was aided and abetted by the press of short-term needs. The participants’ experience of cooperation in sharing collections and cataloging, which had taken place over decades, had built trust among the libraries. The Five College presidents encouraged collaboration and supported innovative problem solving. This project bears watching, and it should be well documented and assessed regularly by the libraries and their users.
4.2. The Emperor Jones: When Preserving Means Restoring
Preserving films of historical or artistic value often entails a physical and historical reconstruction of what the film was in its original state and judging that state on the basis of evidence from as many authentic source materials as possible. That means that the restoration of a film may depend on the preservation of both film and non-film source materials that contain information about the film in its so-called original manifestation. These sources would include documents that might reveal what the film looked and sounded like, e.g., negatives, positives, scripts, stills, publicity materials, contemporary reviews, production company records, and copyright deposits. It also means that film preservation can require good detective skills and well-informed judgments about the cultural forces that shaped the film at the time of its creation and later, as well as technical expertise in obsolete media, editing skills, and access to specialized equipment and reliable sources of funding.
In 1999, the Library of Congress undertook a restoration of The Emperor Jones as part of its contribution to the Treasures of American Film Archives project, which was initiated by the National Film Preservation Foundation with funds from the Pew Charitable Trusts. The Library staff’s first task was to determine what constituted the original film when it was released in 1933.28 There were several versions available, each of them incomplete in one or another way. The version of the film best known to the public was released by the American Film Institute in 19691970. It had been assembled from two pre-World War II-censored 16-mm prints. Another version, derived from a heavily censored source in Canada, was distributed in videotape format by Janus Film.
Preservation work began with gathering all extant versions and researching nearly every aspect of the film’s production and distribution history. In this case, it is a history riven with controversy. That the film exists in such bowdlerized form today stems not only from the fact that many considered its content offensive or objectionable but that the lead actor, Paul Robeson, became a persona non grata because of his outspoken left-wing politics. Cultural artifacts associated with such controversy are particularly difficult to preserve, yet it becomes especially important to preserve them precisely for those reasons.
The film is based on an expressionist-style one-act play by Eugene O’Neill, first produced in 1920 and revived four years later. The 1924 revival starred Paul Robeson. The play was successful, if controversial, among critics and theatergoers. The subject matter and language were racially charged, but the poetic vision and expressionist staging made for compelling theater.
A small independent company produced the film in 1933. The producers secured O’Neill’s permission to make the film, but the playwright had no involvement in it. His only requirement was that Robeson again play the lead role. (Robeson’s character was Brutus Jones, a Pullman porter who becomes the lord of a Caribbean Island and ultimately meets a violent death.) The film script, written by DuBose Heyward, had received O’Neill’s approval.
The film debuted in September 1933 and had a controversial opening run. White and Black audiences generally acclaimed the film a success, while agreeing it was more of an “art film” than entertainment. The African-American press voiced objections, saying that the lead character conformed to negative stereotypes of African Americans and that the language was replete with racial epithets. The film played for only two weeks before its distributor, United Artists, began cutting certain parts that the critics had found objectionable.
These cuts, which included excising the word “nigger,” used repeatedly by Jones as well as some White characters; cutting depictions of Whites subordinated to or physically intimidated by Blacks; and removing some suggestive sexual scenes, were not the first attempts at censorship of the film. Even before its general release, the producers, under advisement from censors, cut a scene in which Jones murders a White guard while serving on a chain gang and dialogue in which Jones describes guard brutality in prison (MacQueen 1990). As a result of prerelease editing, about two minutes was cut from the film.
Other immediate post-release cuts included removal from the sound track of all instances of frankly offensive words and even the word “Light!” spoken when Jones orders his White flunky to ignite his cigarette (the word was blanked out on the sound track but the image is unaltered). There is a mix of cuts that could be inferred from rough or jumpy spots in extant versions, but little unambiguous evidence about which scenes or parts of dialogue have been cut and when.
Library of Congress staff inspected all known sources of the film, including elements held at the Library itself, the National Archives of Canada, and the Museum of Modern Art. Staff members contacted numerous people who might have reason to have an unknown print, including some associated with the original production. They also researched contemporary reviews to determine how long the film ran, and production company records to learn more about what had been filmed. They obtained a copy of the shooting, or continuity, script that had been sent to a censor in New York and now resided at the New York Public Library. Also examined was a video transfer of the copy held by Gosfilmofond in Moscow-an English-language version with German subtitles that the experts hoped might include missing scenes. It did not, however, have any new material in it. Staff was also able to locate the sound track on Vitaphone-style discs owned by a private collector who was willing to lend them.29 Listed in order of footage used, the Library used (1) the original picture and track negatives from the Library of Congress in the Universal Collection; (2) the incomplete studio print, also in the Universal Collection; (3) an incomplete Canadian print owned by the Museum of Modern Art; (4) Vitaphone-style sound disks owned by the late David Goldenberg; (5) the National Archives of Canada archival negative; (6) the archival picture negative owned by Janus; and (7) a pre-World War II 16-mm print owned by Douris Films and housed in the Rohauer Collection.
By September 2001, the Library of Congress had produced a copy of The Emperor Jones that experts believe to be approximately 3 minutes shorter than the original 80-minute film. Some elements of the film were still missing or had badly deteriorated. To get the film in sync with the sound track and to restore damaged frames, lab technicians doubled some frames and held others in freeze frame. The sound track was a complete version of that of the original general release.
The costs of this kind of preservation and restoration work, if done at a commercial facility, would be $35,000 or more for the film portion alone. Sound work could cost another $10,000 or more. These figures do not include the costs of research and staff time. Like rare-book conservation, film restoration is a high-investment solution for an artifact of special value that is deemed to be endangered. It is not a treatment that all films need to or should receive. Like works on paper and recorded sound, moving image materials require degrees of discrimination among objects and the treatment-passive care or active intervention-that should be used. There is always a balancing act between the items perceived to have high intellectual or cultural value at a specific time, the fragility of the physical item, and the funds and other resources available to treat them. It is fair to say that, as part of a larger national initiative to fund film preservation and restoration, The Emperor Jones was one of several works that received treatment because the funds became available.
4.3 Preserving Oral Traditions
Although the technology for recording sound is scarcely 100 years old, it has radically transformed the ways in which societies communicate, create art, document their lives, and fill idle time and empty rooms. It has also spawned large and lucrative industries that record and disseminate music, spoken word, and ambient sounds, and has introduced undreamed-of complexities into a copyright regime that was designed to manage rights for textual materials. Nearly every aspect of sound preservation is affected by these transformations: what to preserve, how to preserve it, how to negotiate and manage complex rights issues, and how to make recorded sound accessible into the future. Even for materials that are not commercially produced and distributed and that are, in essence, unpublished, rights and access issues can have a powerful inhibiting force on preservation. There is no better example of the challenges of sound preservation than that presented by recorded folklore.
The technical problems of preserving sound that is recorded on fragile media such as wax cylinders or cassette tapes, dependent as they are upon playback equipment that is quickly made obsolete by emerging technologies, may appear at first blush to be the chief obstacle to preservation and access. This is far from true. Future access to folklore resources will be equally dependent on the two other legs of the three-legged stool of access: (1) how materials are organized and described or cataloged for easy retrieval; and (2) whether or not present and future users will have legal or ethical rights to look at, cite, or reproduce the original sources. Given the challenge of physical preservation, few libraries and archives are willing to invest in preserving collections that are uncataloged or not cleared for research or educational uses. Most recorded folklore materials are technically unpublished, just as are radio broadcasts, television soundtracks, interviews, and recordings of live dramatic and musical performances.30 This means that the folklore materials are seldom well indexed (indexing being an investment in inventory control that commercial firms must make to manage distribution, but which many documenters will not do systematically). There are equally parlous problems with intellectual and moral rights: a clear audit trail of written informed consent seldom accompanies folk recordings, and the rights of both the documenters and the documented are often contested under those circumstances.
The American Folklore Society (AFS) and the American Folklife Center (AFC) at the Library of Congress recently convened a group to identify the barriers to preserving audio folklore collections and to develop strategies for overcoming those barriers. Academic specialists and curators were concerned about decaying physical collections, which they saw as primarily technical. “These were the familiar challenges of media degradation and format obsolescence that have eluded effective remediation for at least a generation. To capture living traditions on documentary media, field workers have been using a variety of media formats, none of which is favorable for long-term preservation and each of which has presented new problems of storage, longevity, and hardware dependencies” (CLIR 2001). These media include wax cylinders; wire recordings; aluminum, shellac, and vinyl discs; glass and acetate masters; digital audiotape (DAT) tape; and cassette tape. Audiotape lasts for 10 to 60 years. But how can one tell exactly how long it will last, and what can one do to slow the inevitable loss?
The AFS and AFC invited technologists, preservation experts, lawyers, and members of various folk communities to discuss the issues and provide guides to best practices. The answers that the technologists gave to questions about stable media were discouraging: From the technical point of view, they said, there will never be a stable recording medium. Sound recordings must be periodically copied onto newer and, one hopes, more stable, media. We must learn to manage and migrate collections regularly, and to live with impermanence.
What is the format favored for future preservation? Whatever it is, it cannot be analog. The experts acknowledged that analog sound recordings have the highest fidelity. The problem with analog is that the media on which the sounds are recorded degrade. The sounds must be frequently copied onto other media, and with every instance of copying, some information is permanently lost. For preservation, the highest-quality medium for reformatting has been quarter-inch analog magnetic tape on open 10-inch reels. This has been the standard for nearly half a century. But these tapes do not last as long as do the underlying sources, such as vinyl long-playing (LP) records. Even worse, only two firms manufacture this analog tape, and its production is not considered a growth industry. The advantage of digital recording is that there is no loss of information. The disadvantage is the need for compression, which diminishes the sound quality. Even with the relatively high fidelity that can be achieved with digital media, compact discs (CDs) will need reformatting and migration in the not-too-distant future.
Reformatting fragile media will require a process of ongoing assessment and triage to identify materials in need of treatment. This process will be successful only to the extent that collections and items in them are well organized and well described. While the immediate future will see management and migration of recorded sound from fragile media onto more stable media, this will be done in the knowledge that even the more stable media are not permanent. Under these circumstances, the best strategy is to develop a risk-management plan that takes into account the end purposes of preservation. By assessing future access demands, we can make informed decisions not only about what the medium or carrier needs to achieve stabilization but also about future users and their needs.
Projecting the access needs of future users is a crucial part of any preservation strategy. One cannot afford to wait and see what will be deemed valuable 25 to 50 years hence. In preserving folk heritage collections, we can judge future needs only by the present generation of scholars, researchers, and the documented communities themselves who need access to the materials.
One of the key findings of the survey of collections undertaken by the AFS and AFC is that much of what has been recorded is poorly controlled, badly labeled, and lacking critical documentation about rights to use. This situation exists in part because of the changing mores of professions such as anthropology, ethnomusicology, and folklore, which, decades ago, did not understand the importance of securing written or recorded informed consent from the peoples whom they were recording. There is no consensus on how to remediate this problem with retrospective collections; however, it is clear that scholars and field workers of the present and future must document fully the conditions under which their subjects grant access.
As with other academic disciplines, those of folklore, anthropology, and ethnomusicology are looking to the time when they might find both primary and secondary sources online. One of the major barriers is that the fields of folklore and ethnomusicology do not have standardized vocabularies for many common terms. This hinders the development of cataloging and indexing schemes for inventory control, not to mention searching across collections. It is critical that a thesaurus of terms be agreed upon, and work is currently being done on this. Issues of rights, vexing enough to sort through in the controlled environment of the reading room, become even more complex in the digital realm. Many people are calling for open and virtually unrestricted access on the Web to indigenous music and folklore. Others urge that the oral traditions of various communities be controlled fully by those communities, even if that means banning access for research purposes by those who are not members of these communities. This has become a particularly heated issue among some Native American nations. In the meantime, there is a need to assess what individuals and institutions have on their shelves, what condition the materials are in, and how quickly these recordings must be reformatted to ensure that they are preserved.
4.4 JSTOR: Online Access and Digital Archiving
JSTOR, a nonprofit journal-archiving project, has made available online to libraries the back files of 153 titles from 17 academic disciplines, a collection that, as of November 2001, exceeded 8 million pages. More than 1,100 libraries from 53 countries participate in this collaborative enterprise. Approximately 180 colleges and universities have had access to the database since early 1997. In the first six months of 2001, more than 2.7 million articles were printed from its database, more than 5.7 million searches were performed, and the database had more than 22 million user access sessions (JSTOR-NEWS 2001).
The original intent of JSTOR was to reduce storage costs for low-use back journals in the humanities and social sciences and to ease access to journal content. While there is some evidence that academic libraries that use JSTOR are now either removing duplicate copies from their shelves or moving copies from the central library stacks to the more cost-effective shelving of remote storage, the chief benefit of JSTOR to date has not been for preservation. It has been to increase access to retrospective secondary literature and to dramatically change how these materials are used. Lexicologists and reference compilers are mining JSTOR, as they have mined the Making of America database. Fred Shapiro, compiler of the forthcoming Yale Dictionary of Quotations, has been using JSTOR to track down attributions for such quotable saws as “There is no such thing as a free lunch,” and to determine that many such sayings have been incorrectly attributed for years by authoritative sources such as Bartlett’s Familiar Quotations (Hafner 2001). (It was not Milton Friedman who first said this in 1975, but Alvin Hanssen in a 1952 Ethics article.) Others have been using the JSTOR database to pinpoint more precisely instances of first use of certain words (Science 2001).
It appears that the electronic articles in JSTOR are being used more frequently than are those in paper form. JSTOR studied the use of back hard-copy and electronic articles to compare access. There were a total of 692 uses of 10 hard-copy journals at five test sites over the course of the three-month survey in 1996. A study of the use of the same journals in JSTOR at the same five sites for the last three months of 1999 yielded more than 7,700 article views. In addition, although there is presumably substantial overlap in articles viewed and those printed, 4,885 articles were printed, bringing the total of articles viewed and printed during the study to 12,581. When this figure is compared with the 692 uses in the 1996 survey, it would seem that electronic access is greatly increasing use of the material. Interdisciplinary use of the journals has also risen-a trend that has not been documented in hard-copy use. Further evidence shows that older articles in certain disciplines do not lose their value; for example, in the field of economics, the average age of the 10 most frequently retrieved articles was 13 years; in mathematics, it was 32 years. This is beginning to raise questions about what constitutes research or pedagogical value in journal literature and about the relationship between citation frequency and that value. The articles that are viewed most frequently are not the same as those that are most frequently cited; in fact, frequently viewed articles may be rarely cited (Guthrie 2000).
JSTOR was designed to solve both preservation problems and access problems. Its goal is to become a repository in which subscribers will eventually develop so much faith that they will relinquish many of their old and unused journals. It is not intended to spur a thoughtless disposal of journals, but rather to shift the burden of retaining low-use materials to those institutions that have taken on the role of “libraries of last resort.” JSTOR has reached agreement with the Center for Research Libraries (CRL) to become the first North American repository of copies of every issue of every journal in its database. JSTOR intends to create several such repositories of hard-copy journals over time, in the belief that distributed artifactual repositories best serve the preservation needs of the research community through redundancy. Such repositories would also provide access to originals on demand. Many of JSTOR’s advocates believe that a few full runs of journals centrally located in a few repositories are more useful than are a plethora of incomplete runs widely dispersed. In its work with libraries, JSTOR has often had difficulty in assembling complete runs of journals. This reveals how challenging it has been even for those libraries that try to be comprehensive in their coverage of one discipline or another (JSTOR 2001).
A number of economic issues may affect the further development of JSTOR and of its now-developing art image counterpart, ArtSTOR. These issues go to the heart of the promise of networked resources: to reduce costs resulting from unnecessary redundancies or, looked at in another way, to avoid future costs that libraries would incur if they continued to develop collections individually. As libraries’ print collections grow, managers are forced to make decisions about where to store them: in the centrally located library stacks on campus or in the more flexible, preservation-friendly environments of offsite, high-density storage. They face two alternatives: (1) keeping a full retrospective collection onsite but not in all subjects and sending a great number of subjects off-campus; and (2) creating a more equitable burden among disciplines by finding the lowest-use items within a classification and sending them offsite. JSTOR statistics reveal that in some fields, the distinction in value or use between older and more recent literature is not meaningful, and using dates of publication to predict demand may unintentionally create problems.
JSTOR has attempted to find out what it costs libraries to store back files of journals. One source has estimated that it costs a library an average of $175,000, based on what it would cost to build that storage today (Bowen 2001). Although this figure is derived from a methodology widely used in library science, it can be misleading because it is really about avoiding building in the future, not savings on current expectations. The costs of doing business in a research library are hard to calculate, since benefits are difficult to quantify. Nonetheless, the cost of housing journals onsite versus offsite can be clearly assessed. A survey that JSTOR conducted in 1999 revealed that 20 percent of respondents already had some journals in remote storage and that 24 percent of respondents indicated that they had plans to move more items offsite (Bowen 2001). Some libraries, especially small college libraries that are committed to serving the needs of teaching first and research second, have never had a policy of keeping journals forever. They are finding that they can gain access to older literature through JSTOR that they would have otherwise had to obtain through interlibrary loan. For libraries of last resort, as well as those attached to research institutions, the trade-offs between storage (onsite or offsite) and deaccessioning may differ and be harder to quantify. Few, if any, ARL libraries are divesting themselves of old hard copies, even the third of fourth copy, of a journal title they hold. For them, the key advantage of participation in JSTOR to date is improved delivery of resources to patrons. These libraries are not looking to eliminate storage costs at present. With JSTOR, however, they can send the second, third, or fourth copy to offsite storage with less concern about compromising ease of access. They are also able to provide access to journals with a convenience that would not otherwise have been possible. JSTOR has done something that no single library would have done on its own behalf: it has been willing to run the copyright gauntlet with publishers and arrive at an access policy that suits the needs of both publishers and researchers. ArtSTOR is aiming to accomplish the same thing with its database of art historical images.
4.5 The Rossetti Archive: Collecting and Preserving the Born-digital Scholarly Publication
The Complete Writings and Pictures of Dante Gabriel Rossetti: A Hypermedia Research Archive (the Rossetti Archive) is a comprehensive electronic edition produced and updated continually since 1993 by Jerome McGann and more than 30 others working under his direction.31 The current version, published by the University of Virginia’s Institute for Advanced Technology in the Humanities (IATH) using Enigma’s Dynaweb software, is the first of four projected installments. It includes 10,388 SGML and JPEG files, presenting material that centers on the 1870 volume of Rossetti’s Poems and outlining the structure that the completed archive will require. This material is marked up in a Document Type Definition (DTD) developed for the project at IATH-the Rossetti Archive Master Document Type Definition (RAM DTD). In addition, there are about 5,000 (offline) TIFF images, from which the JPEGs are derived; some HTML pages with introductory, summary, and navigational materials; and perhaps two dozen style sheets. The publication also includes 18 essays about the archive, by McGann and others, marked up in HTML and available from the “Resources” area of the archive. The completed Rossetti Archive is likely to contain 25,000 files and to take another 10 to 12 years (and another 30 or 40 people) to finish. The University of Virginia, private foundations, and corporations have already invested hundreds of thousands of dollars in developing this resource; perhaps as much as a million dollars will be invested by the time the project ends.
The Rossetti Archive is a valuable scholarly publication, not only in terms of the effort and money invested but also in terms of the role it has played in the migration of humanities scholarship online, in the pioneering of electronic scholarly editions, and in the history of humanities computing. For these reasons, it is worth collecting and preserving in a research library. To date, however, digital library efforts have focused on library-based production of library-owned digital primary resources. Libraries have not yet had to deal with second-generation digital library problems, where the focus is on scholarly analysis, reprocessing, and creation of digital primary resources.
In January 2000, IATH and The University of Virginia Library’s Digital Library Research and Development group (DLR&D) began a three-year project, “Supporting Digital Scholarship” (SDS), to investigate the problem that the Rossetti Archive and other originally digital scholarly publications pose to research libraries. Funded by The Andrew W. Mellon Foundation and codirected by IATH Director John Unsworth and DLR&D Director Thornton Staples, SDS aims to address second-generation digital library problems. This project is examining three digital library problems:
- scholarly use of digital primary resources
- library adoption of originally digital scholarly research
- co-creation of digital resources by scholars, publishers, and libraries
Approaching these problems requires developing technical methods and institutional policies for collecting originally digital scholarly publications. Accordingly, SDS has formed two working committees-one on technical issues and one on policy issues. The technical committee is responsible for production and implementation of the software, standards, and systems that this project requires. The goal of the technical committee is to build the systems that show what can and cannot be done, at a technical level, to support digital scholarship. The policy committee is charged with considering and proposing policies governing long-term preservation and access for digital materials in the library and policies covering the integration, dissemination, and reuse of those materials. The goal of the policy committee is to produce guidelines for collecting digital scholarship that outline what libraries can and cannot promise to do with these materials, depending on what form they take, what standards they do or do not adhere to, what functionality they have, and how they achieve it.
Technical work in this project takes place within a digital library architecture called FEDORA (Flexible Extensible Digital Object Repository Architecture). FEDORA originated at Cornell University in research done by Carl Lagoze and others; the Virginia implementation is FEDORA’s largest testbed to date and its first real-world installation. The Andrew W. Mellon Foundation has funded an extension of this research that will involve beta-test installations of FEDORA in a half-dozen research libraries in the United States and the United Kingdom.32 In Virginia’s implementation of FEDORA, objects within a digital library or repository consist of a basis (e.g., a JPEG image in a simple object, or a machine-readable text with page images, in a complex object), plus three metadata packages (administrative, technical, and descriptive). Finally, objects can be associated with one or more “disseminators”-data structures that pair a particular set of behaviors, or “signatures,” with methods to produce that behavior (in the current IATH implementation, servlets).
In October 2001, the Rossetti Archive was first collected into FEDORA as a set of XML documents with XSL style sheets that mimic the functionality of the Dynaweb publication. To accomplish this, the SGML DTD had to be modified so that it was capable of validating SGML and XML. James Clark’s SX was used to generate the XML, and mark-up in the documents themselves had to be adjusted to disambiguate some forms of references to other documents on which navigation and selection would depend. Entity declarations, held separately in a catalog file in the original SGML version, had to be distributed into the files that contained the relevant entity references. This task required the efforts of several staff members, working part-time, over several months-far beyond the effort that a library would be willing to devote to collect a single publication under normal circumstances, but far less time than has gone into creating the Rossetti Archive. The point of this experiment is not to demonstrate a cost-effective collection strategy but to develop an understanding of what characteristics an originally digital publication should have in order to be collectable at reasonable cost. Any such publication will have a better chance of being converted to a collectable form if it is highly structured, even if its form is idiosyncratic.
The publication’s basic content-its textual information, image data and, perhaps most important from an editorial point of view, the profusion of relationships among the texts and images that make up the archive-is part of a library collection. It has been collected in a way that should make it possible to migrate the data forward as mark-up standards, delivery mechanisms, browsers, and other elements of the digital library environment continue to develop.
Work remains to be done. The objects in the Rossetti Archive require administrative, technical, and descriptive metadata, and it is unclear how much of these metadata can be shared, how much can be automatically harvested from the existing data, and how much will have to be created by hand. There are some unsolved problems, too. Searching is one of the most difficult technical problems, because so much basic navigation (in the Rossetti Archive, but also in many SGML- or XML-based document structures) is predicated on queries, and because, until recently, there has been no standards-based way of expressing such queries. That problem may be solved for the Rossetti Archive by Tamino, a commercial XML database product that implements the full XPath standard and promises to implement XQuery, as soon as that standard is approved. If this works as advertised, there will be no part of the Rossetti Archive that cannot be expressed in a completely software- and hardware-independent way. In principle, therefore, all of its informational content or functionality will be fungible across future changes in technology.
That optimistic assessment should not obscure the difficulties that might attend the collection of a different sort of originally digital scholarly publication. For example, if the relationships in the Rossetti Archive had been embodied in a relational database, rather than in an XML structure, there would be no standards-based way to express them. Consequently, this important aspect of the publication could be expressed only in a software-dependent way. This would make it much more difficult, perhaps impossible, for a collecting library to make commitment to maintain, migrate, preserve, and access the materials over time.
If scholars are to produce originally digital publications that are compatible with libraries’ needs and that allow libraries to collect at reasonable costs, then best practices for authoring have to be understood, established, and supported in some kind of networked, institutionalized work space. Publishers should ultimately support this workspace, though it may need to be designed and developed, initially, by libraries themselves. Prototyping such a workspace will occupy the second half of the SDS project.
27 A forerunner of the Five-College Library Depository, the Hampshire Interlibrary Center [HILC], was established in the 1950s. HILC was disbanded during the 1970s, by which time each of the contributing libraries had erected larger libraries and had reclaimed their collections.
28 The substance of this case study was relayed by Ken Weissman, head of the Motion Picture Conservation Center at the Library of Congress, who supervised the work described here. He generously made available records that documented the work and additional background materials, and offered invaluable advice on the general matter of film preservation and restoration. James Cozart and Jennifer Dennis, of the Library of Congress, and Annette Melville, of the National Film Preservation Foundation, provided additional expertise.