Close this search box.
Close this search box.

PART II: The Research Library in the 21st Century: Collecting, Preserving, and Making Accessible Resources for Scholarship

Abby Smith

(Abby Smith is an independent consultant. She was formerly Director of Programs at the Council on Library and Information Resources).

According to Samuel Johnson, “Knowledge is of two kinds. We know a subject ourselves, or we know where we can find information upon it.” Until recently, we knew where we could find information upon any given subject—in a research library. Libraries collected, preserved, and made available an array of resources needed by scholars. The bigger and more comprehensive the research library, the greater was the community’s access to knowledge, as well as access to those experts who could help patrons navigate the library’s geography of knowledge. Because scholarship has been primarily print and artifact based, the library was bound to acquire and then maintain in usable form scholarly literature and primary resources in order to make them accessible. In hindsight, it seems unlikely that between them, so many libraries would have redundantly purchased so much of the non-unique secondary scholarly literature if they could have made it accessible to their patrons in less expensive ways—ways that did not demand large and continuing investments in physical, technical, and staff infrastructure. The success of interlibrary loan gives some evidence to this surmise.

Whereas libraries once seemed like the best answer to the question “Where do I find. . . ?” the search engine now rules. Researchers—be they senior scholars or freshmen—no longer make the library the first stop in their search for knowledge. The shift from producing and consuming information in hard copy to multimedia digital form has moved the center of information gravity from research libraries to the Internet, and done so in a dramatically brief period. The preconditions for this sudden shift were laid in the 19th century by the development of audio and visual formats—still and moving images, recorded sound, and, ultimately, formats combining sound and image. A bifurcation eventually emerged between campus-based “general collection” libraries, which focused on secondary literature and a highly selective group of primary sources (both print and nonprint), and libraries not serving first and foremost a faculty and student body, and which focused on “special collections.”1

I mention this division of labor among research libraries because it is a mistake to grant exclusive agency to digital information in the shift away from the centrality of academic research libraries in collecting and preserving resources for scholarship. The academic research library has been predominant in collecting and preserving text-based scholarly literature, but it has not been the primary home for statistical data, cartographic materials, manuscript collections, prints and photographs, film, broadcast television and radio, folklore documentation, natural history specimens, and an overwhelming preponderance of primary source materials needed by scholars in the humanities, social sciences, and physical sciences. The challenges facing academic research libraries are fundamentally different from those facing nonacademic research libraries, not because of their mission (they both serve scholarship) but because of their user base. I will focus my remarks on the former because they are facing more urgent pressures to change, and so emerging trends for research libraries of all stripes may be easier to comprehend.

If we take libraries-as-first-resort in search out of the equation, what is left looks something like stewardship, loosely defined: ensuring long-term access to content in reliable, secure, and authentic form. But we already know that a significant portion of digital scholarly literature and primary resources—that is, the portion available through licensed agreements—is seldom in the possession and care of research libraries. Perhaps a preliminary answer to the question “What are the core functions of the research library with respect to collecting, preserving, and making accessible resources for scholarship?” might be that research libraries will be stewards of some sectors of the information universe, but they will not be the same sectors as before. So which sectors will they be?

Collecting, Preserving, Making Accessible: Where We Are Headed

To answer that question, we will examine six trends in the academic research environment that are likely to shape scholarship in the next decades. From these trends we may learn something about what resources scholars will use and how. First, however, I believe that one thing about scholarship will never change: scholars will demand access to information resources to examine what others have discovered and thought; to use and reuse evidence and scientific conclusions; and to publish results of their own research based on these resources. That is why their sources must be authentic, reliable, easy to find and retrieve, and easy to use and reuse.

1. Ascendance of science

The physical and life sciences are expanding their footprint on almost every Research I university campus. Science programs have become huge cost centers, consuming an ever-larger portion of university expenditures on research infrastructure. Because of the way science is funded, however, these programs are often viewed on campus as revenue centers: they are recipients of the largest federal grants and the largest philanthropic donations, in the tens and hundreds of millions of dollars. Science is where the big donors like to make their mark, comparable to the way that libraries were magnets for philanthropic donations in the 19th and 20th centuries. So science, which costs universities a great deal, will nonetheless increase in charisma; and the humanities, which neither cost so much nor bring in so much money, appear at present to be politically disadvantaged.

But that is just the money side of the equation. More significant in the long run is the influence of scientific reasoning on nonscientific domains of research. There is a general expansion of quantitative reasoning and methods into normally qualitative disciplines. For example, imaginative uses of geographic information systems (GIS) in history, archaeology, and art history, and data mining in classics and other text-driven disciplines are breathing new life into old disciplines. There is a burgeoning demand among social scientists to incorporate into their research an array of scientific data—such as epidemiological information and distribution patterns of genetic variations with health care statistics—and methods, such as GIS-based geographical analysis used to plot and examine polling or census data, consumption patterns, and so forth.

Finally, more and more scientists are recognizing that persistent data management is crucial to their research. Hence, they are developing library-like centers for the collection, curation, preservation, and access of data. The National Science Foundation has encouraged them to do so by putting out a call to develop such structures as key components of scientific cyberinfrastructure. Private foundations, including the Alfred P. Sloan and the Gordon and Betty Moore Foundations, are making equally significant investments in scholarly communication that include stewardship as well as dissemination.

2. Development of digital humanities

The accelerated development of digital humanities is an even more significant trend for research libraries, if only because humanists have been their primary clientele. Beyond the increasing use of quantitative research methods in the humanities, there is a growing demand by humanists to access and manipulate resources in digital form. With the primacy of “data-driven humanities,” certain humanities disciplines will eventually grow their own domain-specific information specialists. While perhaps trained as librarians or archivists, such specialists will work embedded in a department or disciplinary research center.

Of greater import is the emergence of digital humanists who continue to focus on narrative, discursive, and essentially qualitative ways of investigating what it means to be human. It is these scholars, interrogating new forms of discourse, narrative, communication, community building, and social networking, who will spend most of their time on the open Web and use wiki and blogging applications, social software, and other as-yet-undreamt-of applications. All these multimedia forms of discourse will present special challenges for collection development and preservation because of their inherent bias toward process over product, a bias that resists fixing expression in the canonical forms upon which analog preservation practices are dependent.

3. Emphasis on process over product
(with respect to scholarly communication)

Distinctions between formal, archival publication and informal modes of scholarly communication are becoming nebulous. Among scientists, we have seen for more than a decade a preference for various types of informal, preprint-type sharing of working drafts, an informal mode of communication that has greater impact on the development of scholarship than the final, archival or formal publication does. (The latter, however, will probably continue to have a greater impact on scholarly careers, at least for the short term.) Humanists are also becoming more engaged with informal, narrative forms of communication, with graduate students and tenured professors alike using vernacular social software applications to build communities of discourse.

What does this mean for scholarly communication? I recently heard a tenured literary theorist say that she hoped never to publish a monograph again. When she gives talks, they are immediately blogged, and she finds this mode of discourse with other scholars highly productive and immediately gratifying. It has also reframed her view of the timetable of monograph production, shifting from inevitable-if-slow to arbitrary-and-obsolete. So much for the time-honored notion that humanists are immune to the pressure of time to get out their research results!

Finally, in many domains we see an erosion of the traditional distinctions between primary and secondary sources and flows of information. Many scholars now argue that publication and dissemination can and should represent evidence as well as argument, and that is precisely what they demand of new-model scholarly communication.

4. Mobile and ubiquitous computing

The headline here is that the laptop is the library. It was recently reported that a researcher at IBM is working on a storage technology that will allow an entire college library to be stored on mobile devices as small as the current iPod.2 Whether it happens two years or five years hence, whether it is IBM or some other company that realizes this goal, the handheld library is foreordained. Even without such a device in hand, we see the dominance of consumer technologies and applications, both commercial and free, in the academy. It is not only the undergraduates who arrive on campus with iPods that can stream courseware and the senior faculty who consult just-in-time Web-based references, even offline, through Zotero. It is that undergraduates can have a sophisticated command of geospatial thinking simply by opening up Google Earth; they do not have to master the intricacies of GIS available through expensive ESRI applications. It also means that graduate students do not require a high-quality but expensive (and far from ubiquitous) resource like ARTstor for creating presentations, sharing links, and drafting articles, when an astounding number of equally high-quality images are available free on Flickr. Then the question for research libraries becomes how to provide persistent access to these sources. Or does it? Does that become someone else’s responsibility?

5. Data deluge

Given the scale of information that scholars must cope with daily, opportunities to acquire skills in information management should be a key element of their education and training. The goal of professional training as a scholar is to maximize the autonomy and enhance the creativity of the scholar as an arbiter of information. We should never underestimate how carefully successful scholars manage their time; ready access to information that fits within the time frames set by the scholar is often the most important criterion in information seeking. Only some aspects of scholarship demand information meeting the rarefied benchmarks of reliability, authenticity, and persistence. That is why many scholars begin searching for information on the Web, and why they often turn to, not their local OPAC, to do a “quick and dirty” literature search.

With one more stage of breakthrough in storage, we could see significant change in the way individuals are able to manage the data deluge. The device under development at IBM, mentioned previously, “could begin to replace flash memory in three to five years, scientists say. Not only would it allow every consumer to carry data equivalent to a college library on small portable devices, but a tenfold or hundredfold increase in memory would be disruptive enough to existing storage technologies that it would undoubtedly unleash the creativity of engineers who would develop totally new entertainment, communication and information products.”3

6. Rising costs and changing funding models for higher education

Competition for funding among all units on campus means that the library must continuously demonstrate its value; it must also bring in money or lower costs simply to provide services demanded by their users. Given the financial pressures on all aspects of higher education, it is imperative to change the service model of the library. When the world was smaller, libraries strove to be many things to many different constituents. The library must now focus on specific communities. Its role in pedagogy seems clear, as pedagogy is always locally based. But an individual library’s role in research, an increasingly global enterprise, is not so clear. Each research library will need to find its niche. This is why the “special-collection” research libraries that have a tradition of being subject or format based may, in the long term, be better models for research libraries than campus-based general-collection libraries are.

Collecting, Preserving, Making Accessible: Two Roles for the Library

So what can we infer from these six trends for the research library with respect to scholarly resources? First, let us define the research library as a line item in a university budget dedicated to managing information resources for research and teaching.4 For our purposes, it matters little whether in 25 years that function will be performed by something with the discrete name of “library.” Whatever its name, that entity will need to focus clearly on two specific roles: one local, the other networked and part of a national and transnational research cyberinfrastructure.

In its local role, the library will be optimized to meet the needs of its campus community. The library is likely to provide repository infrastructure for stewardship of university-based information assets. Most of those assets will support pedagogy, administration, student life, alumni affairs, and other things vital to the school. A much smaller portion of them will support research. Research will be a far more global phenomenon than local institutions can support on their own.

In its networked role, the library will be able to support research and dissemination to the extent that it is tightly networked into the increasing cluster of inter-institutional collaborations that enable the creation and use of scholarly content. These collaborations will be key elements of research cyberinfrastructure, an infrastructure that will be a research-and-dissemination platform. In the magic phrase of the digital era, it “will scale,” be ubiquitous, and support a variety of scholarly domains, from astronomy to nanobiology, archaeology to urban design. The next-generation research library must be firmly embedded in that infrastructure, because that will be the platform to which scholars will gain access on their laptop library.

The exact models of stewardship and dissemination in the cyberinfrastructure will be determined by the evolution of domain practices. In the quantitative fields, we see domain-specific stewardship models such as genome and protein databases, the Virtual Observatory, and the Inter-university Consortium for Political and Social Research (ICPSR), among others, that look quite similar to “special-collection” libraries writ large. These entities are scaled to collect, preserve, and make accessible digital research content. They are deeply embedded within the communities of researchers that they serve. These stewardship models are optimized to handle content created by and for the academic community.

These networked efforts should also be extended to the data that are created outside the dominion of the academy, of particular value to humanists. This content comes in roughly two flavors—commercially created (usually gated) and publicly networked (ungated). So far, one organization focusing on stewardship of publicly networked content—the Internet Archive—has achieved scale. It is so successful at this that it provides vital services for numerous national libraries and government organizations seeking to archive their domains. Scarcely a decade old, it is already indispensable. While scores of university research libraries are collecting Web-based content in selected areas, none of them achieves, or even aims for, the scale or breadth necessary to collect digital content that scholars will demand. While I believe that certain research libraries can achieve comparable scale in collecting, it is unclear that any are planning to do so, or that they even see this as part of their core mission. It is equally unclear which libraries, if any, other than the Library of Congress, are contemplating large-scale partnerships with commercial content providers to ensure long-term access to primary digital resources. This is bad news. In the absence of such efforts, researchers will be forced to rely on commercial entities to preserve and make accessible their own content on their own terms.

Where academic libraries have been more effective, not surprisingly, is in joining networked efforts, such as LOCKSS, CLOCKSS, and Portico, to ensure persistent access to scholarly literature. These are important efforts and have much to teach about the challenges of forging long-term trusted relationships that can ensure access to content over time. For this is the make-it-or-break-it challenge for academic and nonacademic research libraries alike: to forge close working relationships with content providers—be they individuals, for-profit corporations, or learned societies—to ensure persistent access to that content for generations to come.


Research libraries evolved over the course of centuries to solve the problem of providing access to information. The library was the place where the artifacts of knowledge were aggregated and individuals came to consult them. The stewardship of artifacts will continue to be a collective responsibility of the research library community. As more of their content becomes available through digital surrogates, more opportunities will open for libraries to design a collective solution to preserving the artifacts.

But if we were to design a system to address the needs of digital scholarly resources, it would certainly be different from the library. The system would combine the functions of library, information technology, and scholarly publishing. Those who manage information resources for research and teaching would take it as ground truth that research is a global and distributed phenomenon. So, too, should be the infrastructure that undergirds it. These managers—be they called librarians or not—would be responsible for building and maintaining the multiple partnerships with scholars, learned societies, content creators, publishers, and, above all, with each other across the globe, that would support persistent access to high-quality research resources.


1 Separately incorporated “special format” libraries on campus share features with both types of libraries; how much varies greatly depending on how closely each is integrated with and funded by the main university library.

2 John Markoff. 2007. “Redefining the Architecture of Memory.” The New York Times (September 11). Available at (accessed November 24, 2007).

3 Ibid.

4 A characterization recently used by Kimberly Douglas, university librarian at Caltech, that distills the relationship of the library to a community of scholars and why it can command so much university money.

Skip to content