CLIR Issues Number 66
Number 66 • November/December 2008
ISSN 1944-7639 (online version)
CLIR Publishes Survey of Digital Humanities Centers
WITH THE RISE in digital humanities research have come questions about the infrastructure needed to support such scholarship. Working with very large data sets requires the use of specialized methods and tools, and an environment that supports large-scale collaboration. But this is costly. As the American Council of Learned Societies report on cyberinfrastructure notes, “When human, institutional, or technical resources become too expensive to replicate at every institution, it makes sense to provide those resources through a more limited number of national centers.”1 What should such centers look like, and what models exist?
To better understand the nature and characteristics of digital humanities centers (DHCs), CLIR in 2007 commissioned information management consultant Diane Zorich to conduct a survey of U.S.-based DHCs. She defines DHCs as physical or virtual entities “where new media and technologies are used for humanities-based research, teaching, and intellectual engagement and experimentation.” The survey investigated the scope, financing, organizational structure, products, services, financing, and sustainability of 32 DHCs, as well as the collaborative aspects of existing models. Completed in May 2008, the survey informed discussions at the Scholarly Communications Institute (SCI) 6, held in July at the University of Virginia.
In November, CLIR published the findings in A Survey of Digital Humanities Centers in the United States, available at https://www.clir.org/pubs/abstract/pub143abst.html.
In an executive summary, Zorich highlights key survey findings as follows:
“The results show that DHCs can be grouped into two general categories:
1. Center focused: Centers organized around a physical location, with a range of projects, programs, and activities undertaken by faculty, researchers, and students. These centers offer a wide array of resources to diverse audiences. Most DHCs operate under this model.
2. Resource focused: Centers organized around a primary resource, located in a virtual space, that serve a specific group of members. All programs and products flow from the resource, and individual and institutional members help sustain the resource by providing content, labor, or other support services.
“The study findings also show that DHCs are entering a new phase of organizational maturity, with concomitant changes in activities, roles, and sustainability. There is a growing interest in fostering greater communication among centers to leverage their numbers for advocacy efforts. However, few DHCs have considered whether an unfettered proliferation of individual centers is an appropriate model for advancing humanities scholarship. Indeed, some features in the current landscape of centers may inadvertently hinder wider research and scholarship. These include the following:
- The silo-like nature of current centers is creating untethered digital production that is detrimental to the needs of humanities scholarship. Today’s centers favor individual projects that address specialized research interests. These projects are rarely integrated into larger digital resources that would make them more widely known and available for the research community. As a result, they receive little exposure outside their center and are at greater risk of being orphaned over time.
- The independent nature of existing centers does not effectively leverage resources community-wide. Centers have overlapping agendas and activities, particularly in training, digitization of collections, and metadata development. Redundant activities across centers are an inefficient use of the scarce resources available to the humanities community.
- Large-scale, coordinated efforts to address the “big” issues in building a humanities cyberinfrastructure, such as repositories that enable long-term access to the centers’ digital production, are missing from the current landscape. Collaborations among existing centers are small and focus on individual partner interests; they do not scale up to address community-wide needs.
“The findings of this survey suggest that new models are needed for large-scale cyberinfrastructure projects, for cross-disciplinary research that cuts a wide swathe across the humanities, and for integrating the huge amounts of digital production already available. Current DHCs will continue to have an important role to play, but that role must be clarified in the context of the broader models that emerge.
“When one is investigating collaborative models for humanities scholarship, the sciences offer a useful framework. Large-scale collaborations in the sciences have been the subject of research that examines the organizational structures and behaviors of these entities and identifies the criteria needed to ensure their success. The humanities should look to this work in planning its own strategies for regional or national models of collaboration.”
Evaluation of Digital Tools
The report appendix summarizes the results of an evaluation of 39 digital tools developed by the DHCs surveyed in the report. The evaluators, Lilly Nguyen and Katie Shilton, graduate students in the Department of Information Studies at the University of California, Los Angeles, defined tools as “software or computing products developed to provide access to, interpret, create, or communicate digital resources.” Working from the premise that tools must be visible, accessible, and understandable to support the research for which they are intended, the two authors looked at the clarity of intentions and functions of the tools and the ease with which users can access them from within the DHC site. They found considerable variance. On the basis of their findings, Nguyen and Shilton recommend seven best practices for tool design for humanities scholars.
Promoting Digital Scholarship
The report findings are being carried over into other work. CLIR has commissioned Ms. Zorich to write an article that draws on her survey findings and focuses on the accomplishments of DHCs. The article will be included in a forthcoming volume on current issues in digital humanities scholarship. The volume will include the proceedings of a joint CLIR-National Endowment for the Humanities workshop, “Promoting Digital Scholarship: Formulating Research Challenges in the Humanities, Social Sciences, and Computation,” held in September 2008,2 as well as white papers commissioned to support that event and an interpretive essay that contextualizes the results of the meeting with recommendations for future programs. CLIR will publish the volume in spring 2009.
DLF Report Examines Tools for Metadata Remediation
by Barrie Howard
TODAY’S INFORMATION LANDSCAPE is dotted with subsystems of collaboration that are moving toward a manifest cyberinfrastructure for the advancement of research and learning at the national, and eventually international, level. One example of such collaboration is the Digital Library Federation (DLF) Aquifer initiative, which aims to make digital content easier for scholars to discover and use through best practices and standards for resource description, federated collection development, and collaborative tool development.
Shareable metadata, which enables the federation of collections from diverse institutions through aggregations of descriptive records, is a cornerstone of this endeavor. Aggregating shareable metadata from collections distributed across the Internet into a single, local database improves the end-user experience by enabling faster processing speeds and the seamless display of complete and consistent records. The benefits to libraries include interoperability across systems, increased access points to resources, and greater visibility for the home institution’s collections and discrete resources.
Recognizing that many libraries hold legacy records of poor quality and that there are no lemon laws for metadata acquired from other sources, last summer DLF Aquifer commissioned a report that would identify and evaluate tools that could be used to tidy up and enhance metadata. The report, titled Future Directions in Metadata Remediation for Metadata Aggregators, was written by Greta de Groat, electronic media cataloger in the Metadata Development Unit of the Stanford University Libraries, and was supported by a grant from The Gladys Krieble Delmas Foundation. It will be available free of charge from the DLF Web site in February 2009; it will also be sold on Amazon.com.
The report is a thoughtful review of commercially and freely available state-of-the-art tools for remediation and enhancement solutions. It is organized into 10 sections, each representing a metadata element such as ‹title›, ‹genre›, or ‹date›, drawn from standard metadata element sets like Dublin Core (DC), Encoded Archival Description (EAD), and MAchine Readable Cataloging (MARC). Each section includes a description of desired services and supporting metadata, coverage of existing tools, and characteristics of yet-to-be-developed tools for attaining the desired services. The report concludes with a glossary of technical terminology and appendixes that describe results from three experiments using several of the tools described in the report. The experiments included tools with functionality for cross-collection topical clustering, proper-name extraction, and language translation.
Intended for metadata specialists, programmers, project planners, and systems architects, the report aims to help librarians make decisions about how to allocate resources for metadata creation, enhancement, and remediation by examining the feasibility of developing tools into production services for automating the cleanup and enhancement of metadata records.
The development of a twenty-first-century cyberinfrastructure for research and learning is likely to be a process of wide-scale integration, comparable in a sense to the way in which a national railroad system developed through connecting subsystems across the U.S. landscape in the nineteenth century. Supporting projects such as the DLF Aquifer initiative and others that created the tools reviewed in this DLF report is an important contribution to this process. The moment when the last spike is driven to unify these subsystems may be closer than we think.
Stephen Nichols Elected CLIR Board Chair
At its November meeting, the CLIR Board elected Stephen Nichols chair. Mr. Nichols, James M. Beall Professor of French and Humanities and Chair of the Department of Romance Languages at Johns Hopkins University, joined the CLIR Board in 2005. He succeeds Paula Kaufman, who served as chair since 2006 and who will continue to serve on the Board. Wendy Pradt Lougee will continue to serve as vice-chair; Jim Williams as secretary; and Herman Pabbruwe as treasurer.
Stephen Rhind-Tutt Joins CLIR Board
The CLIR Board has elected as its newest member Stephen Rhind-Tutt, president of Alexander Street Press, publisher of scholarly digital collections in the humanities and social sciences. Mr. Rhind-Tutt began his experience in electronic publishing in the 1980s. He has been responsible for the development, sales, and management of more than 300 electronic products, including SilverPlatter’s range of medical databases and the Gale Group’s Infotrac line. Before founding Alexander Street Press, he was president of Chadwyck-Healey Inc., where he oversaw the release of 15 new Web products.
“Stephen Rhind-Tutt and Alexander Street Press are recognized as innovative leaders in digital scholarly publishing,” said Chairman Stephen Nichols. “Mr. Rhind-Tutt will bring valuable perspective to CLIR’s work, and I am pleased to welcome him to the Board.”
Sayeed Choudhury Appointed CLIR Senior Presidential Fellow
G. Sayeed Choudhury of Johns Hopkins University has been appointed CLIR Senior Presidential Fellow. Mr. Choudhury is associate dean for library digital programs and Hodson Director of the Digital Research and Curation Center at the Sheridan Libraries.
He is also director of operations for the Institute of Data Intensive Engineering and Science (IDIES) based at Johns Hopkins. Mr. Choudhury serves as principal investigator for projects funded through the National Science Foundation, Institute of Museum and Library Services, and the Mellon Foundation.
“I am delighted that Sayeed Choudhury has accepted the appointment of CLIR Senior Presidential Fellow,” said CLIR President Charles Henry. “Sayeed’s work in support of the humanities and the sciences—a rare combination of expertise and methodological scope—is exemplary. As a Fellow, CLIR will call upon Sayeed for advice and occasional participation in selected conferences and symposia that align with his interests. He will bring to CLIR a unique perspective and a reputation for building successful coalitions of diverse constituencies, something that is at the heart of CLIR’s mission.”
“I am deeply honored to be chosen as a CLIR Senior Presidential Fellow, especially given the respect that I have for CLIR and the existing Presidential Fellows,” said Mr. Choudhury. “More than anything, this recognition from CLIR reflects the excellent work of many colleagues within the Sheridan Libraries and many faculty and students at Johns Hopkins and beyond. I look forward to working with CLIR in this capacity,” he added.
Mr. Choudhury’s appointment begins January 1, 2009.
Hewlett Foundation Supports Planning for Leadership Institute in Africa
The William and Flora Hewlett Foundation has awarded Stanford University $202,813 to plan a leadership development institute for mid-level managers of sub-Saharan African libraries and information technology organizations in higher education and research institutions. The institute will be modeled on the Frye Leadership Institute, which trains librarians and information technology managers in skills needed to guide and transform academic information services for higher education.
According to the grant proposal, the aim of the institute will be to develop mid-level staff “who could become the next generation of leaders . . . who define the needs and aspirations for access to information that could assist African people, individuals as well as groups, in defining their issues, defining their needs, and defining the means by which solutions to the needs and issues might be addressed. That approach involves not just developing leaders from a cadre of middle managers, but as well connecting the community of existing library and information technology leaders as well as aspirant leaders to their counterparts in other parts of Africa and in other regions of the world.”
In organizing the project, Stanford will work closely with CLIR, Emory University Libraries, and the Bibliotheca Alexandrina in Egypt, which will host the institute. Some 20 library leaders and CIOs from institutions in sub-Saharan Africa will participate in the planning process, which will kick off with a weeklong meeting in Cape Town, South Africa, in spring 2009.
Apply Now for Zipf Fellowship
CLIR is now accepting applications for the A. R. Zipf Fellowship in Information Management. The fellowship is awarded to a student who is enrolled in graduate school, in the early stages of study, and shows exceptional promise for leadership and technical achievement in information management. This year’s award will be $10,000. The application deadline is April 6, 2009. For more information and to apply online, visit https://www.clir.org/fellowships/zipf/zipf.html.
CLIR Publishes 2007-08 Annual Report
CLIR’s 2007-2008 annual report is now available online at https://www.clir.org/pubs/annual/annual.html. From this year forward, CLIR will issue annual reports in electronic format only.
CLIR Announces Hidden Collections Awards
CLIR ANNOUNCES THE following recipients of Cataloging Hidden Special Collections and Archives awards:
Avery Research Center for African American History and Culture at the College of Charleston
Providing Access to African American Collections at the Avery Research Center
California Historical Society
California Ephemera Project
Center for the History of Medicine, Countway
Library, Harvard Medical School
Foundations of Public Health Policy
Getty Research Institute
Uncovering Archives and Rare Photographs: Two Models for Creating Accession-level Finding Aids Using Archivists’ Toolkit
Mapping Special Collections for Research and Teaching at Goucher College
Library of Congress
Library of Congress Multi-Sheet Map Series Collection: Africa
Litchfield Historical Society
Litchfield Historical Society’s Revolutionary Era and Early Republic Holdings
New York University
The Records of the Communist Party, USA: A Preservation and Access Project
Northwestern University Library
The Africana Posters: Hidden Collections of Northwestern and Michigan State University Libraries
University and Jepson Herbaria, University of California, Berkeley
Cataloging Hidden Archives of Western Botany and Beyond
University of Michigan Library
Collaboration in Cataloging: Islamic Manuscripts at Michigan
University of Pennsylvania Libraries
Hidden Collections in the Philadelphia Area: A Consortial Processing and Cataloging Initiative
The following collaborative project was awarded $900,000:
Archives from Atlanta, Cradle of the Civil Rights Movement: The Papers of Andrew Young, SCLC, and NAACP-Atlanta Chapter
Robert W. Woodruff Library, Atlanta University Center
Processing Voter Education Project Collection
Amistad Research Center
Working for Freedom: Documenting Civil Rights Organizations
Over the next one to three years, these institutions will create cataloging records of their special collections holdings that can be accessed through the Internet and Web. This will enable the federation of disparate, local cataloging entries with tools to aggregate the information by topic and theme.
In January, CLIR will begin building a basic registry of hidden collections and archives, created from information in the proposals, that can be searched through a Web-based platform.
For more information on the Cataloging Hidden Special Collections and Archives grant program, visit https://www.clir.org/hiddencollections/index.html.
The MetaArchive Cooperative:
A New Collaborative Service Organization
Providing a Distributed Digital Preservation Infrastructure
by Martin Halbert and Katherine Skinner
RESEARCH LIBRARIES ARE rapidly digitizing or acquiring local digital archives with long-term value for both scholarly and public research purposes. Many of these libraries, however, lack affordable and scalable digital preservation infrastructures, or even a consensus on what basic actions they should take to protect these resources. The dearth of tested and functional preservation solutions has meant that all too often digital archives are only backed up, not preserved. Even those few archives that are preserved are often stored in a single location, where they are vulnerable to loss for any number of reasons.
Fortunately, affordable and scalable options for geographically distributed digital preservation are now beginning to emerge. Among these efforts is the MetaArchive Cooperative, an international preservation cooperative for digital archives established in 2003 under the auspices of and with funding from the Library of Congress’s National Digital Information Infrastructure and Preservation Program. The cooperative provides low-cost services to its member institutions by means of a distributed digital preservation network that enables them to securely cache and preserve content in geographically dispersed sites. It has helped create statewide networks in Alabama and Arizona. MetaArchive has also hosted workshops and other events to support and train groups wishing to establish networks and to foster awareness of digital preservation issues, and it plans to host more such meetings in 2009.
Fundamentally, MetaArchive is a growing community of cultural memory organizations, including both libraries and research institutes, that are collaborating to preserve the digital assets that make up many research libraries’ most innovative new collections and services. In joining the cooperative, each member organization signals its intention to take an active role in the preservation of its own digital assets as well as those held by the larger community.
MetaArchive is a cooperative, not a vendor. Its members do not pay for services, but rather invest in creating and sustaining their own preservation infrastructure. All hardware and software are owned, operated, and controlled by members. The group’s cooperative structure enables this infrastructure to be shared through a coordinated group governance process, with a sustainable business model that is supported by membership fees, cooperative agreements with the Library of Congress, and funding from agencies such as the National Historical Publications and Records Commission. The MetaArchive charter, membership agreement, technical documents, and business and management plans are openly available at http://www.MetaArchive.org/resources.html.
The MetaArchive Cooperative is a leader in the development and use of open-source software for distributed digital preservation. Its network was the first major effort to build and operate a private implementation of the open-source LOCKSS software for digital preservation, an approach that has come to be termed a Private LOCKSS Network, or PLN. The cooperative has created administrative modules and layered them on top of the LOCKSS software to implement a conspectus database for members. This database provides facilities to capture collection metadata for preservation decisions and actions. The cooperative is packaging its open-source software for use by other PLNs, and plans to release it through SourceForge next year.
The cooperative is more than a technical solution for preservation. It is a learning environment in which members gain experience in developing and enacting a full preservation plan for their assets. It has three tiers of membership: contributing, preservation, and sustaining. Contributing members are smaller institutions that are interested in using the shared network infrastructure to preserve digital content but that lack the capacity to operate any technical infrastructure of their own. Preservation members are responsible for preserving digital content on an ongoing basis. At a minimum, every preservation site must have responsible staff and a minimally configured node server. Sustaining members are responsible for hosting a preservation node, participating in the steering committee, and developing the computer systems that enable the preservation network. Membership fees for each tier are the minimum required for operating the cooperative, and range from $300 to $5,000 per year, together with a fee of $2 every three years per GB of content contributed. Membership fees cover the expense of replicated storage space for the network at cost. It is unlikely that any similar replicated digital preservation service can be established at lower costs.
The MetaArchive Cooperative is representative of a new generation of shared cyberinfrastructure collaborative organizations that span institutional boundaries in a common effort to develop innovative ways of dealing with the data preservation challenge. Such organizations may provide new solutions to the challenges facing cultural memory organizations as they seek to navigate the uncharted waters of the digital age.