CLIR Issues Number 59
Number 59 • September/October 2007
As We May Rethink by Chuck Henry
Editor’s note: This is the first in a series of essays that will explore CI development in the context of current methods of promotion and tenure, models of scholarly publishing, the organization of universities, and our ways of knowing.
IN THE INTRODUCTION to his often-cited 1945 essay, “As We May Think,” Vannevar Bush waxed nostalgic as he recalled the community of scientists from many disciplines who came together to contribute to the national effort that led to the end of World War II. Bush emphasized the significance of this unusual coalition of talent and expertise and then called for a new coalescence of scientists to create the “memex”—a technology that supplements, organizes, and extends human memory.
Planning for a national cyberinfrastructure (CI) is similar in consequence to Bush’s proposed memex endeavor, but probably more complicated and of greater scale. It will require far more diverse engagement within the disciplines of higher education than anything that Vannevar Bush envisioned. As our exposure to the concept of CI increases, we gain a deeper understanding of its complexity. In fairly short time, the term has come to represent a new environment for the conduct of research and teaching in the sciences, engineering, and the humanities.
The new cyberinfrastructure calls into question many of the methods and procedures with which we have worked for the past two decades. We have become comfortable with the technology, and execute much of our work using familiar applications on indispensable machines. CI is, in essence, an environment that facilitates sharing of data on an unprecedented scale, which in turn implies a far greater degree of federation, aggregation, and interoperable capabilities than we have heretofore experienced. It demands new kinds of expertise, which will require new forms of training and mentoring to recognize and respond to changing research behaviors. While the transformational potential of CI on higher education is not difficult to intuit, the details of this transformation have yet to be defined, and remain ambiguous.
It is precisely this ambiguity that allows us to explore the multiple possibilities of developing a functional and robust cyberinfrastructure and to create this new environment in the most flexible and nuanced fashion possible. Succeeding in the evolving CI will require that we thoroughly rethink our procedures and expectations on the technical as well as the social levels, for the technical and social are deeply interrelated in cyberinfrastructure.
Consider, for example, the sheer enormity of data to be supported. Many of the vast data sets are relatively new—not only in the humanities, with its large full-text and video databases, but also in astronomy and particle physics. Challenges such as data mining, semantic searches, multimedia data stewardship, and interoperability are common to all disciplines. This suggests that forward-looking researchers and scholars will need to exchange ideas and CI requirements for their mutual benefit.
There is little precedent for this kind of interdisciplinary dialogue. Past practice, characterized by a focus on traditional disciplinary purviews, silos of funded projects, and poor communication among researchers across intellectual boundaries, is at odds with the conceptual underpinnings of CI. In this respect, the past should not be prologue: our traditional methods of doing business and conducting research, as well as our systems of professional advancement, may undermine our best intentions unless we recognize the limitations of the academic procedures that have brought us to a point of new awareness.
Our basic understanding of intellectual productivity and of the means and structures that we have instantiated to achieve it may constrain us. More precisely, the terms and concepts with which we define, regulate, and assess our achievements may work against the fluid and creative approaches needed to build the CI as an environment that transcends, and not intermittently mimics, our current circumstances. Take, for example, three accepted concepts by which we structure our research and organize our world: projects, centers, and formats. These are the building blocks of the academic enterprise.
The word project has come to refer to a research process that facilitates discoveries that contribute to an enriched understanding of the subject matter under investigation. Projects are by nature limited in duration, tightly focused, and specific to a discipline or subdiscipline. Most disciplines depend on projects for the incremental advancement of knowledge.
In many ways, the research process that a project embodies is antithetical to the broad, interdisciplinary, and often open-ended conversations that are needed to advance knowledge in the CI environment. Reconceptualizing the environment on the scale defined by cyberinfrastructure will require that existing academic boundaries become more porous. It will also require rethinking of how higher education is organized, on the basis of departments and schools that both reflect and instantiate the intellectual divisions that projects support and to which they lend authority.
Cyberinfrastructure cannot be built project by project. The comparatively limited, incremental nature of project-based knowledge acquisition is inconsistent with the more encompassing perspicuity needed for the development of CI. Projects, as currently defined, will become appropriate only after a compelling vision for CI and its practical implementation have been articulated—not as a means to attain that vision.
Thus it may be counterproductive to conceive cyberinfrastructure as a project, at least initially. It is better construed as an evolving system—a combination of physical and behavioral conditions that will have enormous consequence for higher education and society—that is not subject to the strictures that govern a project but rather is a mutable platform upon which new projects can be executed.
Our vocabulary similarly constrains us in moving beyond existing, comfortable paradigms of organized behavior. One instance is the recommendation in the American Council of Learned Societies’ (ACLS) 2007 report, Our Cultural Commonwealth, to “establish national centers to support scholarship that contributes to and supports cyberinfrastructure” (35). The authors of the report intentionally did not describe such a center. The formulation of the recommendation—to invest in cyberinfrastructure in part by establishing centers that would further support and extend CI—is somewhat recursive as well. In recent months, this recommendation has received much attention, and it has become apparent that the term center is misleading, because it implies a fixed point or well-defined local cluster of related activities.
Under the CI, the word center may become a transmuted concept, a virtual center with many federated parts, perhaps a “collaboratory” of tools and applications. This broader interpretation allows a more flexible and creative response to the needs of scholars and researchers. The ACLS report assumes that scholarship in the humanities is on an irrevocable course of change, with increasing dependence on technology. This assumption requires us to explore more rigorously the efficacy of electronic resources: what problems gave rise to the development of the digital datasets; what questions can be asked that could not be pursued in an analog environment; what implications this has for the discipline; what the effect will be on graduate and undergraduate education; and similar core issues.
A final example of the delimiting effect of a term, particularly one weighted by an accrued body of scholarship, is the weight and warrant that we give to formats—in the sense of material form or layout. This pertains especially to the humanities, but also to the library profession. An exceptional amount of thought and effort has been expended on distinguishing between printed matter and digital representations. Cyberinfrastructure is singularly digital and, in some respects, format agnostic. It will facilitate access to, and queries against, an enormous amount of data, in many different media, but it will also facilitate the reuse and unique reconstitution of these data. Without undermining the merits of discussions on format, we should acknowledge that CI will generate new forms, arrangements, and organizations of information that may revise, challenge, or even upend our traditional understanding of formats, and that will require a more expansive exploration.
The national, cross-disciplinary interest in cyberinfrastructure recalls another famous essay focused on science and its historical transitions. The paradigm shift that Thomas Kuhn identified as fundamental to the advancement of scientific knowledge in his 1962 work, The Structure of Scientific Revolutions, remains a popular trope. Kuhn’s description of the paradigm shift—a result of a conversation over time rather than of a blinding and sudden revelation—should be kept in mind as our discourse on CI becomes more extensive, encompassing, and discipline-agnostic. It remains a question, however, whether our language, organization of knowledge, and methods of progressive discovery will support or impede our aspirations.
IN NOVEMBER, CLIR will issue the final version of Preservation in the Age of Large-Scale Digitization: A White Paper. The paper examines preservation issues relevant to large-scale digitization initiatives (LSDIs) such as those being done by Google, Microsoft, and the Open Content Alliance (OCA). It was written by Oya Rieger, interim assistant university librarian for digital library and information technologies at Cornell University Library. A draft is available at https://www.clir.org//activities/details/mdpres.html.
The aim of LSDIs, to make more content accessible, is inseparable from the question of keeping such materials fit for use over time. The paper identifies issues that will influence the availability and usability of the digital books being created by LSDIs, and considers the relationship between these new resources and print collections. Given that the digitizing partners, as well as the participating libraries, are investing significant resources in LSDIs, how can we secure—or improve—a long-term return on this investment? Ms. Rieger addresses this question with 13 recommendations, summarized as follows.
1. Reassess Digitization Requirements for Archival Images
The prevailing digitization standards and best practices were established 15 years ago and are based on modest collection sizes and often on bitonal scanning. We need to create new digitization metrics that are based on current imaging technologies, quality assessment tools, archiving practices, and evolving user needs. Current data about the quality of images provided to participating libraries are anecdotal. To evaluate the suitability of digital objects for preservation purposes, it may be useful to conduct a systematic image-quality study based on inspection of sample images and associated metadata.
2. Develop a Feasible Quality Control Program
We need to reassess the quality control (QC) policies, tools, and workflows that were created to support small-scale digitization projects and to acknowledge that it is neither practical nor feasible to apply existing QC protocols to LSDIs. Creating good-quality images during the initial capture should be emphasized, so that QC is an assurance process whose purpose is to catch infrequent problems rather than to serve as a frontline strategy. The library community should negotiate rigorous technical specifications with digitization partners to reduce reliance on the QC stage for catching missing or unacceptable images.
3. Seek Compromise to Balance Preservation and Access Requirements
Because of the scale of LSDIs, participating institutions are finding that they cannot fully adopt existing preservation digitization practices. They are seeking compromises, such as dropping or reducing QC programs, settling for resolutions lower than 600 dpi, or switching to a different file format. Also, LSDIs are finding they must implement space-efficient digitization strategies to reduce long-term storage costs and increase transmission efficiency. Many of these compromises have resulted in practices and products that do not meet the quality standards that the community has developed. It is time to seek reconciliation and reach a resolution in our community with a clear acknowledgment of the pitfalls and virtues of such compromises, while also recognizing that the library community should continue to advocate for raising the quality bar.
4. Enhance Access to Digitized Content
Digital content that is not used is prone to loss. For this reason, archiving investments will be more worthwhile if efforts are made to improve discovery, access, and delivery. Some LSDI libraries plan to experiment with enhanced access and with discovery tools and text-mining techniques. This can be accomplished only if the libraries pool their resources and build on each other’s accomplishments. Building communities and systems for sharing and searching information about copyrights and their holders will also be important, since copyright information is a critical element in preservation and access decisions.
5. Understand the Impact of Contractual Restriction on Preservation Responsibilities
Commercial LSDI partners often restrict the sharing of full-text digitized content and, at best, stipulate that participating libraries may share copies of digitized materials only with academic institutions and only as long as they agree not to make the files available to other commercial Internet search services. Such restrictions are likely to impede some preservation strategies, such as redundancy arrangements. Having more than one search engine host the same content is likely to increase the survival of digital materials. The library community will benefit from forming a united front to address with commercial partners the limitations that they place on their copies of digital materials.
6. Support Shared Print-Storage Initiatives
With the increasing value placed on online access, research institutions will be pressured to justify investments in maintaining their legacy print collections, some of which are rarely used and redundant. Consolidation of holdings in a shared storage environment can save space and offer better environmental controls. Agreements among geographically distributed print repositories can create additional economies of scale. OCLC Programs and Research RLG Programs are undertaking research and programs in this area. National and regional shared-storage efforts demonstrating strong leadership need firm support from the library community.
7. Promote the Use of the DLF/OCLC Registry of Digital Masters
The DLF/OCLC Registry of Digital Masters (RDM) is a central place for libraries to search for digitally preserved materials.1 By registering digitized objects with the RDM, a library indicates its commitment to preserve digitized collections, potentially reducing redundant efforts by other institutions. The registry also has the potential to record an array of relevant information that will support the preservation of content as well as the planning of future digitization efforts. Rather than relying on LSDI libraries to register digitized content, it may be more effective for OCLC to work with Google, Microsoft, OCA, and the Million Books Project to automatically ingest and record such information, with pointers to the university’s digital copies.
8. Outline a Large-Scale Digitization Initiative Archiving Action Agenda
Developing a common archival strategy is a complex process. A wide range of archival models and policies has been customized to institutional goals, resources, and content types; however, diversity of preservation strategies can also be seen as a virtue as the library community continues to learn about the various options and approaches. Although a joint archival solution is ideal, the collaboration agenda is not limited to providing a common preservation repository. Collaboration may take various forms, several of which are noted in the white paper.
9. Devise Policies for Designating Digital Preservation Levels
Organizationally and financially, we cannot keep and preserve all digital content at the same level of service and functionality. LSDI libraries must therefore determine the extent and type of their preservation efforts. Given that the library community is unlikely to have funds to redigitize the same content, digital books will inevitably be viewed as “insurance copies”—as backups for originals, regardless of the questions about quality. Because selection can be time-consuming and expensive, it is likely that the trend will be to preserve everything for “just-in-case” use. There are two options: all files can be automatically preserved at the same level; or metrics may be used to make a decision on the basis of the material’s perceived value and use. This topic is worth exploring further by means of a risk analysis of cost-efficient preservation strategies for low-use content.
10. Capture and Share Cost Information
Although digitization costs such as those of materials shipping, scanning, processing, optical character recognition creation, and indexing are covered by commercial partners, participating libraries also invest significant time in negotiating, planning, overseeing, selecting, creating pick lists, extracting bibliographic data, pulling and reshelving books, and receiving and managing digital content. It is important to document the expenses for all the partners associated with LSDIs. Often neglected or underestimated in cost analysis are the accumulated investments that libraries have made in selecting, purchasing, housing, and preserving their collections.
11. Revisit Library Priorities and Strategies
Libraries are under increasing pressure to focus digital preservation efforts on unpublished and born-digital information, where preservation concerns are most urgent. It will be challenging to balance the imperative to preserve the digital versions of already-published analog materials with the growing need to focus on born-digital materials. The costs of processing and archiving new digital material may shift the way in which funds are distributed among services at many libraries. It is important to try to define the role of LSDIs within the broader scope of library activities and midterm strategies.
12. Shift to an Agile and Open Planning Model
Traditional strategic planning and consensus models are unlikely to support the decision-making processes of research libraries in today’s fluid information technology environment. Libraries must develop scalable and flexible infrastructures that facilitate rapid execution, and they must be willing to take calculated risks. Holding out for an ideal solution is often not feasible; moreover, implementing less-than-perfect solutions can enable libraries to continue to refine their strategies as new options become available.
13. Reenvision Collection Development for Research Libraries
At the heart of many LSDI-related questions is the future direction for collection development programs in research libraries, and, especially, how future selection and acquisition decisions will be shaped in the light of increased online content and worldwide access to core collections.
Incentives to Collaborate
Many of the paper’s recommendations will require collaboration among cultural institutions. “Reading some of the recommendations in this paper,” Rieger notes, “one may rightfully ask, ‘What makes the LSDI agenda appealing enough to overcome the barriers to collaboration and what are the incentives to work together?'” Her conclusion addresses this question from the perspectives of stewardship responsibility, enduring access, cost-effectiveness, and the future role of research libraries.
1 DLF/OCLC Registry of Digital Masters: http://www.oclc.org/digitalpreservation/why/digitalregistry/.
The Frye Institute will be held June 1–12, 2008, at Emory University. The Institute is an intensive, two-week residential program in which participants study and analyze the leadership challenges stemming from the changing context of higher education. Participants will be selected competitively from among nominees and applicants who have a commitment to, and talent for, leadership within higher education.
To apply for the Institute, an individual must first be nominated by a senior institutional officer. Nominations must be submitted by November 1, 2007, using a nomination form available at www.fryeinstitute.org. CLIR will notify the nominees and encourage them to apply. Applications must be postmarked by December 3.
The Institute is supported by a grant from the Robert W. Woodruff Foundation and is sponsored by CLIR, EDUCAUSE, and Emory University. The Institute can be contacted by e-mail at firstname.lastname@example.org.
Rovelstad Scholarship in International Librarianship
CLIR awards one Rovelstad Scholarship each year to a student of library and information science to attend the World Library and Information Congress of the International Federation of Library Associations and Institutions (IFLA). The scholarship enables students who have an interest in international library work to participate in IFLA early in their careers. The 2008 IFLA annual meeting will take place in Québec, Canada, in August.
Applicants must be enrolled in an accredited school of library and information sciences. They must be citizens or permanent residents of the United States. The application deadline is January 21, 2008. Applications may be made online at https://www.clir.org//fellowships/rovelstad/rovelstad.html.
Mellon Fellowship Program for Dissertation Research in the Humanities in Original Sources
CLIR will award about 10 fellowships to support dissertation research in original source material for periods of 9 to 12 months. Each fellowship will carry a stipend of up to $20,000. Applicants must be enrolled in a doctoral program in a graduate school in the United States. They must have completed all doctoral requirements except their dissertation research and be ready to start that research between June 1 and September 1, 2008. Their dissertation proposals must have been accepted at least six months before the starting date of the fellowship.
More information on eligibility and application forms are available at https://www.clir.org//fellowships/mellon/mellon.html.
Complete applications must be submitted using CLIR’s online application form by November 23, 2007. Materials that are required to be submitted in hard copy (including transcripts and references) must be postmarked to CLIR by November 23, 2007.
Fellowship awards will be announced by April 1, 2008.
CLIR WILL JOIN the Library of Congress, the National Archives and Records Administration, and the Joint Information Systems Committee of the United Kingdom as an institutional participant in a Blue Ribbon Task Force on Sustainable Digital Preservation and Access. The Task Force is funded by the National Science Foundation and The Andrew W. Mellon Foundation.
The Task Force is charged with developing recommendations for promoting the economic sustainability of digital information for the academic, public, and private sectors. The Task Force will be cochaired by Fran Berman, director of the San Diego Supercomputer Center at University of California, San Diego, and a pioneer in data cyberinfrastructure; and Brian Lavoie, a research scientist and economist in the Office of Research of the OCLC Online Computer Library Center, Inc.
The Task Force will comprise about 15 national and international leaders, representing a cross-section of fields and disciplines, including information and computer sciences, economics, entertainment, library and archival sciences, government, and business. Over the next two years, the Task Force will invite additional experts from the academic, public, and private sectors to participate in periodic panels and discussions.
Over the course of its work, the Task Force expects to produce a series of articles about the challenges and opportunities of digital information preservation for both the scholarly community and the public. Its final report will include an analysis of issues and recommendations for catalyzing the development of sustainable resource strategies for the reliable preservation of digital information.
CLIR HAS COMMISSIONED information management consultant Diane Zorich to conduct a survey of U.S.-based digital humanities centers (DHCs). The survey will investigate the scope of the centers and their financing, organizational structure, products, services, and sustainability, as well as the collaborative aspects of existing models. Findings will be used to inform next summer’s Scholarly Communications Institute. The Institute, scheduled for June 2008 at the University of Virginia, will be devoted to assessing the needs, priorities, and challenges of national digital humanities centers. Ms. Zorich will survey about 30 DHCs between October 2007 and January 2008, and will submit a report on her work in May. An overview of her findings will be published next summer in CLIR Issues.
Ms. Zorich, who specializes in information management for cultural organizations, is author of A Survey of Digital Cultural Heritage Initiatives and Their Sustainability Concerns, which CLIR published in 2003.