CLIR Issues Number 58
Number 58 • July/August 2007
Digital Scholarship: What’s All the Fuss? by Stephen Nichols
Cyberinfrastructure: It’s All about Sharing by Amy Friedlander
by Stephen Nichols
THE POINT, OF course, is that there is no “fuss” about digital scholarship—or none to speak of. And that is precisely the problem. Since the late Middle Ages, the book has been the standard vehicle for expressing ideas, for announcing innovations, and for debating change. Although books continue to have that role, print can no longer claim proprietary rights to disseminating and storing information. Indeed, it has been more than a generation since many scholars wrote anything in a non-digital format. Like every other segment of society, academics have adapted their modes of scholarly research to the incredible advances in information made possible by the Internet and digital technology.
Still, while many scholars today use digital technologies and content in their research and writing, and will readily admit their advantages for their own work, most have been slower to admit—or have refused to admit—that such technology and resources are capable of totally transforming the nature and scope of scholarship. Many scholars find themselves using digital resources simply to do “analog scholarship,” that is, research that uses new technology in old ways. And while there’s nothing wrong with that, a problem arises when analog scholarship remains the standard—the only accepted gauge—for evaluating scholarship in general.
Academia’s allegiance to analog scholarship is especially pernicious for younger scholars who would like to explore the horizons of digital scholarship but are warned that their academic future lies with traditional print scholarship. The rule might as well be cast in stone: you must have several articles published in reputable, refereed journals to be hired. To have your contract renewed, you must have published still more articles and have a book manuscript nearly ready for submission. For tenure at the associate professor level, you need a published book, or at least a contract for a manuscript. Finally, for promotion to full professor, two or more books are required.
This formula has changed little for the past century. Now, however, it threatens to undermine the very concept of academic excellence it was designed to preserve. Those who adamantly insist on it fail to recognize that the Web and Internet have placed us in the midst of a revolution that has the potential for transforming how we think about, and access, our objects of study.
Large Databases Enable New Discovery
Humanists have traditionally viewed locating and compiling information, or “data,” as a basic task of scholarship. This material permits them to make discoveries and thus provide new insights in their chosen fields. Until recently, such data came in discrete blocks that the scholar himself amassed on index cards or similar “storage devices.” For these scholars, the mass of information accumulated was less important than the ability to arrange individual items to fashion a persuasive or innovative argument. The advent of huge databases, as well as the ability to access archives, works, and images preserved in geographically remote repositories, has transformed the scholar’s job. Our task is no longer to find individual items to support our research, but to discover how information in the aggregate changes the way in which we look at material.
Take as an example my own discipline, medieval literature. Medievalists have always studied literary works in critical editions—modern editions of a work edited by a scholar other than the original author—even though the works were originally transmitted by manuscripts that gave different (sometimes quite different) versions of the work. Since the manuscripts are preserved in geographically remote repositories, it was not possible to consult different manuscript versions of a work side by side. This meant that the edited text was used for research, even though it was modern rather than medieval in origin.
With the possibility of having digital libraries of medieval manuscripts online, everything changes. Now, for the first time, scholars can study authentically medieval versions of texts side by side. We can see, for example, that many of them were illuminated richly with images showing how contemporaries of the work—for the artists were first readers, then illustrators—perceived it. A text that stands by itself in a printed edition—stripped of pictures, comments, captions, or decoration—suddenly becomes both more meaningful and less isolated when represented, digitally, in an original form that included these features. The manuscript gains complexity from its layered context. We understand better how medieval readers demanded the presence of several different kinds of representation, each reflecting on and mirroring the other.
Access Prompts New Questions
Can we, as scholars, do justice to a work without taking into account its multilayered context? Are we not required to consider the materiality of the manuscript when we interpret a work? When a work comes to us in a manuscript produced by a team—scribe, artist, decorator, etc.—long after its author’s death, how does that affect our interpretation of authorship? Can we cling to our notion of a unified literary work when it is transmitted in a large number of manuscripts, produced over several hundred years in different places by dozens of scribes and artists?
Given the fact that a medieval vernacular work is not a unique text, but may exist in dozens or even scores of manuscripts, we need to formulate new questions to deal with its many versions. We need to come to grips with the consequences of our having at our fingertips scores of manuscripts for study and comparison. Access to this material will change long-held assumptions about the uniformity of vernacular literary language. It will help us better understand why pictures accompany certain kinds of works and how these images offer a critique of—or at least a commentary on—the text. Comparison across many versions of the “same” work will help us understand the plasticity of medieval narrative sequence, where the order of important scenes can vary from one version to another. Finally, we may question modern assumptions about the integrity, or unity, of a text in light of the interpolations and excisions that occur at will in different manuscripts.
Digital libraries are growing exponentially in every discipline. For scholarly researchers, they are much more than a convenient resource: they are a new frontier. To explore this frontier, we will need new methods and new theories. That’s why we should be making a fuss about digital scholarship.
Stephen G. Nichols is a CLIR Board member and professor of French at Johns Hopkins University
by Amy Friedlander
MORE THAN A decade ago, the Corporation for National Research Initiatives (CNRI) launched a series of studies on the history of large-scale, technology-intensive infrastructures in the United States with the explicit goal of informing the development of a national information infrastructure.1Infrastructure was understood as a system with at least four properties: it is widely accessible, it is shared and ubiquitous, and it confers economic advantage. This definition has proved robust across five studies on railroads, telegraphy and telephony, electricity, banking, and radio.
The Shared Layer
Unpacking the implications of this definition exposes characteristics of infrastructure that will be useful as we advance the cyberinfrastructure. For instance, the notion of sharability implies the existence of multiple stakeholders who trust a shared system, even though their individual interests may be neither uniform nor congruent. The classic example is a market where competitors have a common framework, assumptions, values, and vocabulary, even though they compete, sometimes quite fiercely. Thus, the key characteristic of infrastructure is overall coherence, not uniformity at every level.
We can think, then, of cyberinfrastructure as a shared layer that supports multiple disciplines, activities, and scholars and that offers a platform for new kinds of scholarship and inquiry. That shared layer encompasses the organizational framework in which standards and codes of practice are developed and made available to users, enabling the shared layer to become widely accessible and hence ubiquitous. For example, a technical standard, such as the four-foot, eight-and-one-half inch railroad gauge or the 110 volt standard for electrical appliances in the United States, is the product of a series of explicit agreements within concerned technical communities that are organized through professional associations. The infrastructure embraces the technologies, procedures, and standards—as well as the organizations, processes, and activities—that enable stakeholders to cooperate to achieve desired goals.
Libraries have long constituted an institutional element in the intellectual infrastructure of education. They make their collections available to patrons in a consistent way, using a network of services that rest on shared cataloging and indexing standards and practices. Libraries also allow users to explore information resources across individual organizations. And, like railroad or electrical engineers, librarians articulate their practices and standards through professional organizations and shared activities.
For reasons such as these, libraries and librarians have a wealth of experience to contribute to the national discussion of a cyberinfrastructure and the data centers that are envisioned as elements of it. CLIR President Chuck Henry has laid out some of the activities that CLIR is undertaking to advance the development of these centers. As his recent article in CLIR Issues2 states, discovering the properties of that shared layer is a fundamental step. It is acknowledged that stewardship of digital data should be a function of these centers, but what range of services will they offer? Will they be independent or colocated with existing institutions, such as archives, libraries, computer centers, and museums? Or will they be hybrids with some subset of attributes of the parent organizations? And how will they change? Infrastructure is not fixed: it changes as technologies advance and as user expectations evolve.
Discovering the Properties of Infrastructure
The Berger Family Technology Transfer Endowment at the Tisch Library of Tufts University recently funded a project that provides a good example of what it means to “discover properties of infrastructure.” The Berger family, long-time supporters of the university, created a small endowment to foster information technology transfer in 1995. Through a competitive process, the endowment funds annual awards to teams of librarians and faculty that demonstrate innovative use of technology to advance teaching and learning on campus and in the community.
In the latest round of proposals, the endowment approved a project called Library Floorplans 2.0.3 The project integrates into a GIS framework a series of separate, stove-piped technical services databases that describe the library’s collections, equipment, and other resources. It allows librarians to see a range of assets—from emergency equipment to the stacks—in a three-dimensional representation of the facility. The goal is twofold: to manage resources more efficiently and to educate staff librarians about the capabilities of GIS systems in a context that is meaningful to them.
In this example, the GIS capability is infrastructure, and the project itself is an exercise in enhancing the efficient management of a library. As the example illustrates, providing the capability—the infrastructure—goes beyond merely acquiring a software package. The library must also obtain the relevant licensing, and perhaps designate a trainer or a way for users to educate themselves, a setting in which the tool is used, and a laboratory in which the system can be built and tested. The laboratory is critical: it includes equipment, software, and physical and virtual space, and it must be partitioned from the operating systems so that librarian-system developers can build the tool safely, without endangering ongoing library functions. Finally, there must be a path from the development tool to full operation. The initial costs of such a project may be high, but once the key components are in place, the library will have the capacity to build, test, pilot, and transition systems from prototype to production and can thus continually upgrade its services.
Cultivating both Forests and Trees
Library Floorplans 2.0 shows that providing infrastructure includes the software tool itself, an environment in which to use it, and user training. As CLIR builds its programs over the next several months, we will look for similar projects that teach us about cyberinfrastructure. For example, we have initiated work on mass digitization and preservation, in the hope of learning more about current practice—what works and what does not. Many of these findings will have immediate relevance for decisions concerning collection-development, preservation, and retention strategies; however, the results of this research can also inform future decisions about functions and capabilities of cyberinfrastructure centers.
Our strategy is the research and education corollary to the old maxim about forests and trees: professors teach their students about trees by explaining the forest; students demonstrate that they understand the forest by mastering the trees. So, CLIR will cultivate both the forest and the trees. With partners in other foundations, public agencies, and interested groups, CLIR will undertake research in relevant topics for cyberinfrastructure, convene meetings among interested parties, and share our findings in CLIR Issues. Watch this space: there will be more to come.
1Friedlander, Amy. CNRI Infrastructure History Series. Abstracts of the five-volume series, published between 1995 and 2005, are available at http://www.cnri.reston.va.us/series.html.
3Cox, Thom, et al. Library Floorplans 2.0: The Spatial Information Manager for the Library. Available at http://www.library.tufts.edu/tisch/berger/2007/Library_Floorplans2-0-revised.pdf
CLIR HAS RECEIVED a grant from The Andrew W. Mellon Foundation to assess the utility to scholars of several large-scale digitization projects. CLIR President Charles Henry and Georgetown University Provost James O’Donnell will lead the project.
Large-scale book-scanning projects are making vast collections of works easily accessible in a form that can be queried, interpreted, and reconstituted as new knowledge. These resources are a potential boon to scholars, enabling research that was previously not possible. But are these databases being organized and built to best support the methodologies and intellectual strategies of contemporary scholarship?
To answer this question, the project will analyze content that has been digitized by Google Book Search, Microsoft’s Live Book Search, Project Gutenberg, Perseus, the American Council of Learned Societies’ Humanities E-Book project, and, possibly, the Open Content Alliance. The project leaders will ask scholars from historical and literary areas of study to summarize key methodological considerations associated with conducting research in their disciplines. The scholars will assess each mass-digitization project under scrutiny from their own perspectives and report their findings. The reports will be synthesized and recommendations drawn from them.
In November 2007, CLIR will convene a larger group of scholars to discuss the findings and recommendations and to determine next steps. A goal will be to develop a strategy for working with independent and corporate database developers to improve the utility of their products. CLIR will issue a public report early in 2008.
The project is described in detail at www.clir.org/activities/details/mellonschol.pdf.
Editor’s note: the following article has been corrected since publication in print.
IN MID-AUGUST, CLIR will make available for public comment a draft white paper examining preservation issues relevant to mass-digitization projects such as those being done by Google, Microsoft, and the Open Content Alliance. The white paper is distinct from, but complementary to, another CLIR effort to investigate the scholarly utility of mass-digitized content (see article above).
Written by Oya Rieger, interim assistant university librarian of the Digital Library and Information Technologies Division at the Cornell University Library, the paper includes profiles of leading mass-digitization projects; a framework for assessing preservation aspects of such projects (selection, image-quality standards and quality control, technical infrastructure, and organizational infrastructure); and an examination of the implications of mass-digitization projects for print collections. It concludes with recommendations for addressing preservation issues relating to mass digitization.
The draft white paper will be available September 10 at https://www.clir.org/activities/details/mdpres.html. The public comment period will close October 5. CLIR will issue a final print and electronic report later this fall.
Edited by Diane Kresh, for the Council on Library and Information Resources.
A compendium of articles, facts, lists, and advice, The Whole Digital Library Handbook provides an engaging overview of digital libraries. Contributions by library luminaries, as well as the perspectives of experts from outside the ranks of library professionals, cover the state of information, issues, customers, challenges, tools and technology, preservation, and the future.
The volume is available from the ALA Store (www.alastore.ala.org).
REGISTRATION IS NOW open for CLIR’s eighth annual sponsors’ symposium. The topic of this year’s session is “The Architecture of Knowledge: How Research Programs and New Courses Are Built.” The symposium will take place Wednesday, December 12, at the Cosmos Club in Washington, D.C.
Featured speakers include the following:
- Christiane Gruber, art historian at Indiana University, will discuss her research on Islamic manuscripts and the social and communication skills needed to access them.
- Stephen Nichols, professor of French at Johns Hopkins University and a CLIR Board member, will talk about how large digital manuscript collections can enrich research and teaching in the humanities. His remarks will be based in part on his ongoing experience with the Roman de la Rose project.
- Nancy Foster, lead anthropologist for the University of Rochester’s River Campus Libraries and co-manager of the Libraries’ Digital Initiatives Unit, will offer an anthropologist’s approach to understanding the patterns and behaviors of researchers and students.
The symposium will run from 9:30 a.m. to 3:00 p.m. and will include a luncheon, at which Donald Waters, program officer for scholarly communications at The Andrew W. Mellon Foundation, will describe the foundation’s funding patterns over the past decade.
A full agenda and registration information are available at www.clir.org. Each CLIR sponsoring institution receives two complimentary registrations.
ALVIN K. CHEUNG, a doctoral student in computer science at the Massachusetts Institute of Technology, has been named the recipient of the 2007 A. R. Zipf Fellowship in Information Management. Cheung’s research focuses on the collection and processing of contextual information called ContextDB.
“While many types of contextual information are readily accessible from the operating system or networking layer, in current systems they are rarely collected.” says Cheung. “In ContextDB, I propose to capture this low-level contextual information and allow users to retrieve documents by running context-based queries, perhaps in concert with traditional structured or keyword queries.” Cheung holds bachelor’s degrees in electrical engineering and music, and a master’s degree in electrical engineering from Stanford University.
Named in honor of A. R. Zipf, a pioneer in information management systems, the $10,000 fellowship is awarded annually to a student who is enrolled in graduate school, is in the early stages of study, and shows exceptional promise for leadership and technical achievement in information management. For more information and a list of previous fellowship recipients, visit https://www.clir.org/fellowships/zipf/zipf.html.
California Institute of Technology
Carnegie Library of Pittsburgh
University of Pittsburgh