CLIR Issues Number 62
Number 62 • March/April 2008
New Grant Program to Fund Cataloging of Hidden Collections
IN JUNE, CLIR will launch a new national program to identify and catalog hidden special collections and archives. The program is supported by generous funding from The Andrew W. Mellon Foundation.
Through a national competition, the program will award funds to institutions holding collections of high scholarly value that are difficult or impossible to locate through finding aids. Award recipients will create descriptive information for their hidden collections that will be linked to and interoperable with that of all other projects funded by this grant. In so doing, they will create a federated environment that can be built upon over time.
The Challenge of Exposing Hidden Resources
Libraries, archives, and cultural institutions hold millions of items that have never been adequately described. These items are all but unknown to, and unused by, the scholars those organizations aim to serve. A 1998 Association of Research Libraries' survey of 99 North American research universities' special collections revealed that, on average, 15 percent of printed volumes in special collections are unprocessed or uncataloged. The figure rises to an average of 27 percent of manuscripts, 35 percent of video holdings, and 37 percent of materials in audio format.
"Librarians and archivists have long despaired at the huge amount of intellectually valuable information that, for lack of cataloging, is unknown or inaccessible to scholars," said CLIR Board Chairperson Paula Kaufman. "This award to CLIR underscores the importance of the hidden-collections problem and supports a coordinated, national response."
"Only a national program can effectively address the problem of hidden collections," said CLIR President Charles Henry. "The records and descriptions this program creates will be accessible through the Internet and the Web, exposing collections to a global audience of scholars, students, and teachers. It will facilitate the harvesting, aggregation, and thematic correlation of the records to advance intellectual productivity. As a cyberinfrastructure effort, the program will also build sustainable communities of complementary backgrounds and perspectives within higher education over time."
Priority to Collections of Scholarly Value
CLIR has formed a review panel to evaluate proposals and select award recipients (see list to right). "The composition of the committee is meant to ensure the program's priority of making collections available that are of the highest value to research and teaching," said Mr. Henry.
The request for proposals (RFP) will ask each applicant to provide information on the scope and depth of the hidden collection, its disciplinary focus, its value to research, the type of media it includes, and other descriptive elements that will help the review panel assess the intellectual impact of cataloging and exposing these materials. The RFP will also require the applicant to respond to questions about long-term sustainability, additional sources of funding, and institutional support.
The program will initially produce two layers of information: (1) a basic registry of hidden collections and archives, created from information in the proposals, that can be found through a Web-based platform; and (2) a descriptive record of a subset of collections that are deemed most urgently in need of cataloging and documentation. The record will evolve as grantees complete their projects.
CLIR anticipates that this project will eventually lead to the creation of a third layer of information—digital versions of the special collections and archives that have been cataloged. The digitization effort will be funded by other sources.
CLIR will issue an RFP in early June. Proposals will be due in late July, and decisions will be announced in fall 2008. CLIR expects to award about $4 million in the first cycle. The program may be extended for subsequent funding cycles over five years.
More information about the award program is available at http://www.clir.org/activities/details/hiddencollections.html. Questions about the program may be directed to Amy Lucko, at firstname.lastname@example.org.
Hidden Collections Review Panel
R. Howard Bloch
Sterling Professor of French
Linda J. Colley
Shelby M.C. Davis 1958 Professor of History
Director of Programs
University of Nebraska Libraries
University of Nebraska at Lincoln
Department of English Language and Literature
University of Chicago
Digital Library Initiative
CLIR Distinguished Presidential Fellow
Publisher of Highwire Press
Ronald L. Larsen
Dean and Professor
School of Information Sciences
University of Pittsburgh
Stephen G. Nichols
James M. Beall Professor of French and Humanities
Johns Hopkins University
CLIR Distinguished Fellow for Leadership Programs
Chief Information Officer
Constance A. Jones Director of Libraries and professor of History
Bryn Mawr College
Richard V. Szary
Director, Louis Round Wilson Library and Associate University Librarian for Special Collections
University of North Carolina at Chapel Hill
Many More than a Million: Building the Digital Environment for the Age of Abundance
HOW DOES SCALE in content, made possible by mass digitization, change humanities research? What infrastructure or systems are necessary to provide services and materials to scholars? In November, a group of scholars in the digital humanities and representatives of research and funding agencies met in Washington, D.C., to address these and related questions and to propose priorities for further work. The workshop, hosted by CLIR, was cochaired by Tufts University Professor of Classics Gregory Crane and CLIR Director of Programs Amy Friedlander. A meeting report, issued in March, and a link to a discussion forum on the topic are available at http://www.clir.org/activities/digitalscholar/index.html.
Realizing the Potential of Large Collections
When digital collections are as large as those created by mass-digitization projects, only the computer can actually "read" the material: there is simply more material than a human can process. But such scale offers an opportunity to pose questions that could not be answered in traditional scholarship. The potential is staggering. In linguistics, for example, digitization of large collections could enable patterns in morphology, syntax, and semantics to be automatically tracked across large stretches of time, space, and culture. Scholars could engage in new types of cross-disciplinary research. Researchers working with cross-language information retrieval, translation-support tools, and even imperfect machine translation could work with a broader range of linguistic materials than has ever been possible.
Realizing this potential, however, will require a range of new services, including semantic markup, multilingual services, and the conversion of raw text produced by mass-digitization projects into data that machines can act on. Humanists working with large collections will need at least four types of data conversion:
- raw optical character recognition output from page images with human-curated, book-level metadata
- curated structural metadata
- curated transcriptions
- structured data sources
Ultimately, researchers will need services that can personalize the materials with which they work, providing intellectual support that matches the background and the momentary intentions of a given user at a given place.
The workshop report poses five major questions that will influence the future of digital scholarship in the humanities.
- How do traditional archival values migrate into the environment made possible by digital data and digital tools? The digital "text" or object becomes plastic, in the sense that it may be devoid of context, may be modified itself or by the addition of automatically generated markup, or may be displayed differently in different systems, even if the "content" remains unchanged. Thus, there is a need for ways to establish authenticity, provenance, and integrity of digital sources as well as versioning, which is already a known problem in the mass-digitization of collections.
- When only the computer actually "reads" an object or a text, a new and not fully understood relationship is created among author, tools, objects, and readers (or users). Traditional paleography and criticism address such relationships among written and printed materials. But how do we model and understand the digital equivalent? Presentation of analog source material in digital form still involves mediation—whether by editors, coders, or machines. What is the shape and form of that mediation?
- What happens when large-scale, team research becomes possible or necessary, enabling interdisciplinary research? The word interdisciplinary has different meanings in different disciplines. To humanists, it has meant reading the literature outside of the core journals in the traditional disciplines rather than interaction in work groups. This raises questions pertaining to attribution of authorship and credit, to recognition of the value of digital research and its expression in digital form, and to recognition of digital scholarship that focuses on infrastructure.
- What are the infrastructure requirements? What belongs to the national cyberinfrastructure that is made available locally? What is maintained centrally on campus? What functions are appropriate at the desktop? And where are the dependencies? Every institution will have to make decisions about what it will support. These decisions can benefit from ongoing dialog on campus among the practitioners of the involved disciplines as well as within the higher education sector as a whole. To ensure a locally appropriate level of service, system-wide efficiencies, and sufficient redundancy to protect critical resources will require a framework within which such decisions can be made.
- What are the big questions that justify the cost of managing digital information? Such questions are rarely articulated in a single meeting; they become apparent only as investigators experiment, share results, and continue the discussion.
The report recommends the following topics as priorities for future work:
- Find ways to provide analytical access to the Open Content Alliance (OCA) book data now available. Scholars should be able to pose questions that analyze very large collections. For example, what passages from Shakespeare or the Bible appear in different genres over time? What sorts of things are said about railroads in 19th-century writing? Large collections of image books are a core data set for humanists. Straightforward applications of high-performance computing could have a rapid impact on some fundamental classes of questions for many humanists.
- Apply exemplary questions to open collections such as the OCA. In cases where users do not have access to the original page images or other data in collections, articulate the access functions whereby Google, Microsoft, and others can provide end-user services and application-programming interfaces. Google and its audience would benefit from systematic feedback from the users who understand their domains best. Developing requirements (at least in a general sense) and providing feedback and evaluation may warrant substantial resources.
- Clarify and compare the costs and benefits of book scanning with those of transcription and markup of complex knowledge sources. We cannot afford to apply human labor and expertise directly to more than a fraction of the published record of humanity. Are there printed materials that would, if converted into machine-actionable form, uniquely enhance our ability to analyze vast bodies of material? We need to begin to formulate general guidelines that can serve as a basis for more-nuanced collection strategies that combine massive digitization with careful conversion of print into machine-actionable knowledge.
- Better understand how to relate high-value, domain-specific services and data structures to services and data structures that are common to all collections. Every discipline needs text searches, but some communities need different kinds of searches. Small disciplines, such as the classics, need to be built on the most general system possible and to focus on their own domain-specific problems. Likewise, some communities may find that they need to fine-tune even the most generic services.
- Examine the education and training required to create the information professional of the future. Both the scholar and the information manager of the future will be well versed in information technologies, capable of adapting to a changing environment, and able to anticipate the diverse challenges of research and of the pedagogical mission of the university. Managing the infrastructure to support a seamless dialog between local and global resources will fall to the library of the future. Identifying, recruiting, and training professionals, and nurturing their career development so they will be ready to staff that future place, is our most important challenge.
Faculty Research Behavior Workshops: A Librarian's Perspective
IN FEBRUARY 2008, the University of California at Berkeley partnered with CLIR to host the fourth faculty research behavior workshop. The two-day session was led by Nancy Foster, lead anthropologist at the University of Rochester Libraries. (Articles on previous workshops at Cornell and Wesleyan universities appeared in CLIR Issues 60 and 56.)
Later, Suzanne Calpestri, the John H. Rowe Librarian and director of the George and Mary Foster Anthropology Library at Berkeley, spoke to CLIR Issues about the value of the workshop.
As the host of the workshop, how did Berkeley benefit from the experience?
We saw it as an opportunity to learn about the work practices of our own faculty through the support of a structured program and a trained "volunteer" workforce. [During one segment of the workshop, participants interview faculty on campus.] Because we hosted the event, two of our staff received free training. We hope to apply the methods they learned to many situations, not just to the study of faculty behavior.
The workshop also provided an opportunity for some ambassadorship. We had a chance to showcase various parts of our library and campus to participants from other academic institutions. At the same time, those participants provided us with a valuable perspective on what we do here. For example, during the library tour, one participant commented on how antiquated our 600 public workstations are. That really speaks to the limitations of our functionality.
How will you use the knowledge gained from the workshop?
We are in the midst of redesigning our library—not just the physical space but the full range of library services and programs. We are committed to involving all stakeholders in this process and redesigning operations around user needs. Our goal is to create a library that fosters a vibrant community of scholarship in part by supporting the range of work, study, and learning practices on our campus not only today but well into the future. The CLIR workshop provided the necessary tools for understanding those practices. As one of our participants said, "I can see reaching back into the toolkit that Nancy provided through all the project's planning stages."
Who Uses Institutional Repositories and Mass-Digitized Collections?
A NEW REPORT from CLIR examines what we know about who is using institutional repositories (IRs) and collections created by mass-digitization projects. Entitled The Seamless Cyberinfrastructure: the Challenges of Studying Users of Mass Digitization and Institutional Repositories, the report aims to help librarians understand the challenges of studying users in these environments and to help them develop ways to assess the impact of their mass-digitization and IR projects. It was written by Dawn Schmitz, 2004-07 CLIR postdoctoral fellow in scholarly information resources.
While acknowledging that mass digitization and IRs are two distinct types of initiatives, the author argues that their differences are less important to the user than they are to the librarian. The resources available through these initiatives fall on a spectrum of materials that users increasingly seek on the Web—from working papers and course outlines to images, e-journals, and digitized monographs. But "while this seamless cyberinfrastructure is a boon for users, it creates difficulties for libraries trying to understand those users and better serve their needs," notes the author.
"Users conduct research without having to authenticate or navigate through a library Web site to find a particular vendor database, and they may not be able to report, if asked later, what resource other than Google they used to find information," Schmitz continues. "This lack of recall or understanding creates difficulty for librarians who wish to study who uses IRs or digitized books, since they cannot simply ask users what types of resources they have used."
The Seamless Cyberinfrastructure begins with an overview of literature published since 2003 on users and user issues relating to mass digitization and IRs. Schmitz found few studies that examined specific patterns of use of mass-digitized collections and IRs. Issues surrounding the latter, she reports, tend to focus on how to engage more faculty in using the IR for self-archiving.
The report then turns to studies of how users search the Internet for information for coursework, teaching, and research. Such studies, the author notes, can be a starting point for research that examines users of IRs and mass-digitized books, since the latter are increasingly found through general Internet searches. Schmitz found a wealth of studies on Internet use among graduate students and faculty. She also found several papers on undergraduate use, most of which documented that undergraduates rely on Internet search engines more than on any other sources for their academic work.
In the final section of the report, Schmitz outlines some methodologies for studying electronic-resource user behaviors and preferences and discusses how these methods might be employed to better understand users of IRs and mass-digitized collections.
The Seamless Cyberinfrastructure: the Challenges of Studying Users of Mass Digitization and Institutional Repositories is available at www.clir.org/pubs/archives/schmitz.pdf.
CLIR Names 2008 Rovelstad Scholarship Recipient
KHUE DUONG, A master's degree candidate in the Information School at the University of Washington, has been named the sixth recipient of the Rovelstad Scholarship in International Librarianship. Duong's interest in international librarianship stems from his conviction that information issues "transcend cultural differences, geographical borders, and governmental policies." To complement his classroom training, he is working at both the Seattle Public Library and the University of Washington Libraries, in preparation for a career in academic librarianship. "Being a first-generation Vietnamese immigrant, I value the importance of understanding the historical, cultural, and linguistic background of the population I work with," writes Duong.
Duong has a bachelor of science degree in chemistry and English from the University of California, Los Angeles, and a master's degree in linguistics from the University of California, Santa Cruz.
The Rovelstad scholarship provides travel funds for a student of library and information science to attend the annual meeting of the World Library and Information Congress. This year's meeting will take place in Québec, Canada, in August.
2008–2009 Mellon Dissertation Fellows Named
Editor's Note: The online version of this article was altered to remove a grantee who declined the award after the newsletter went to press.
NINE GRADUATE STUDENTS have been selected to receive awards this year under the Mellon Fellowship Program for Dissertation Research in the Humanities in Original Sources, which CLIR administers.
The fellowships are intended to help graduate students in the humanities and related social science fields pursue research wherever relevant sources are available; gain skill and creativity in using primary source materials in libraries, archives, museums, and related repositories; and provide suggestions to CLIR about how such source materials can be made more accessible and useful.
The fellowships carry stipends of up to $20,000 each to support dissertation research for periods of up to 12 months.
From Shipmates to Soldiers: Emerging Black Identities in Montevideo, 1770–1850
University of Miami
The Reform of Popular Piety in the Closing Years of the Venetian Republic
Manuscript Separates and the Culture of Political Opposition in England, 1615–1640
New York University
The Urban History of Deindustrialization: Pittsburgh and Hamilton, Ontario
Sindhis Between Region, Religion and Nation
University of California, Santa Cruz
From Sun to Soil to Stomach: The Intellectual History of a Forgotten Link Between Ecology, Agronomy, and Nutrition, 1920–60
The Making of the Middle-Class Family: Property, Inheritance and the Urban Family Economy in the British Atlantic World, c. 1670–1780
Massachusetts Institute of Technology
After the Copy: China, Dafen Village, and the Hand-Painted Art Product