Lessons in Deep Resource Sharing from the University of California Libraries • CLIR

Daniel Greenstein

The research library’s historic role is providing access to great collections of scholarly knowledge. To date, those great collections have been assembled in a single place, with a high level of professional service surrounding them, in support of research, teaching, and all sorts of civic and cultural engagements. The greatest challenge that research libraries face today is to fundamentally transform themselves so that they may continue to build and maintain those collections. I suggest that the traditional collection development model-one that assembles information resources and people in physical proximity to it in a single organization-is no longer a functional one. Instead, we are driven by the challenges we face to implement a new division of labor between organizationally distinctive, layered library services that work interdependently to provide individual users with the full suite of collections and services that they require.

Layering Library Services at the University of California

The story will be told with reference to the University of California (UC), where a layered library model is beginning to emerge. Before introducing the model itself, it is important to reflect a little on the context in which it is becoming realized. If it were a nation in its own right, the state of California would claim the fifth or sixth largest gross national product in the world. The state has two public university systems: the University of California and the California State University. The University of California has 10 campuses (the tenth, Merced, will begin enrolling students soon), nearly 200,000 student full-time equivalents, and about 5,000 faculty members. Its governance and funding are both highly decentralized.¹

UC also has 11 university libraries. Ten of these are located on the campuses (where they are in most cases themselves library systems), and one, the California Digital Library (CDL), is located at the Office of the President. Collectively, the libraries hold nearly 32 million volumes, and their combined annual budget includes some $240 million in state funding. Harvard libraries, by comparison, claim some 14 million volumes. In maintaining the breadth and depth of their collections, UC libraries, like other great research libraries, are hard pressed to keep up with the escalating costs of scholarly publications. These costs have risen more rapidly than library budgets in the past several years. Figure 1 shows the extent of the challenge. It compares a price index calculated for scholarly journals with the consumer price and higher education price indices, respectively, and demonstrates that libraries-in good years as well as in bad-cannot keep up with the annual 6-12 percent price increases in scholarly journal subscription costs.

Figure 2 shows the same problem in a slightly different way. It charts the annual increase in the number of volumes published worldwide with the declining purchasing power, in volumes, of the state funding that libraries receive for monograph purchases.²

Fig. 1. Periodical price increases in comparison with common inflation indexes,
1985-2000

Fig. 2. Growth in publishing and decline in library
buying power, 1988-2001

This inflation in both the volume and the cost of scholarly publications has forced the UC libraries to seek new ways of maintaining their historic collecting roles. In particular, they have invested collectively in services that all require but that none can afford independently. Looking briefly at a number of these services, we will see a layered library service model beginning to emerge, one in which campus libraries build upon a range of common or utility services in order to better meet the very distinctive local needs of their own faculty, students, and civic constituencies.

Regional Libraries, a Union Catalog, and a Digital Collection

The regional library facilities (compact print storage facilities of which UC has two, in the north and the south) were an early, perhaps the first, UC experiment with a new library service model. These facilities that are paid for centrally and managed (by Berkeley in the north and UCLA in the south) for the use of the libraries generally, free up scarce shelving space that is available in campus libraries, thereby enabling them to keep locally maintained collections current.

A second utility is a union catalog, Melvyl^®, which makes information available to any user, anywhere in the system-anywhere in the world, in fact-about the UC libraries’ collective holdings. By combining Melvyl with an online patron initiated interlibrary loan service (a further utility), the UC libraries give their users access to more than 32 million volumes as if they formed part of a virtual uniform library. Figure 3 shows the results of an online search conducted using Melvyl. A publication called Adaptive Instructional Systems is not widely held by the UC libraries. So a user at Riverside who is interested in the title clicks the Request button, and the volume is delivered within 24 to 48 hours.

Fig. 3. Melvyl and patron-initiated request

Another utility, of more recent origin, is a collection of digital materials that the libraries agree to license or purchase together. The collection is one of the largest made available digitally by a research library and at present includes more than 8,000 journal titles, 250 reference and other databases, all books printed in English before 1800, 200,000 digital images of works of art and architecture, and 4,500 social scientific and government statistical databases. Nothing in this collection is acquired that is not agreed to and paid for by every library.³ The rationale for the shared digital collection’s development is simple. Digital information doesn’t need to live anywhere in particular and can be accessed from anywhere over the network. Rather than acquiring highly redundant local digital collections, the UC libraries began in 1997 to acquire some digital materials together-not as a buying club, but as a single corporate entity. By sharing in the development of digital collections, the UC libraries can effectively share in a variety of essential tasks, including identification, review, vendor negotiation, content acquisitions, and acquisitions processing. They also exercise and enhance their buying power acting as the University of California libraries.

A next step, a very new one for the UC libraries, is to think about extending the shared collection from digital to printed materials. The UC libraries are, for example, building shared collections of printed journals that exist in digital formats and exploring the development of shared collections of federal and state government documents. The rationale for print is as it is for digital:

enhancing collections and services that each UC campus library makes available to its faculty and students;
expanding the breadth and depth of collections available systemwide to support the university’s distinguished teaching and research programs;
reducing unnecessary duplication of campus holdings; and
saving substantially in cost and effort.

Planning for the shared print collection has been a revealing process and has forced us to ask hard but essential questions. Of the materials on our libraries’ shelves, which of them do we need to continue holding redundantly? Are there economies to be had through some coordination? How can shared print holdings be collaboratively governed?⁴

We are starting with print materials where cooperative collection development makes obvious sense, notably with new journals (e.g., as published by Elsevier and the Association for Computing Machinery [ACM]) where a single print edition is supplied “free” to the UC libraries in respect of their systemwide electronic site license. In these economic times, when libraries are beginning to cancel print subscriptions where electronic versions exist, we are also expecting this kind of shared collection to ensure that print editions aren’t knowingly or willingly lost to the system. We are also thinking retrospectively about focusing not only on journals that are available online but also on federal and state government publications. In an interesting hallway discussion recently, two of our university librarians found themselves wondering whether and to what extent libraries should share in the cost of “core” materials, leaving campus libraries to enhance, maintain, and assert their distinctiveness by investing in distinctive local collections.

The shared print collection is yet another example of a utility set of services. It enables campus libraries to provide a higher level of collection and service support for research and teaching on their campuses and for the various public communities they serve.

The layering model is also evident in a range of technology applications that are supplied by the California Digital Library in close cooperation with the campus libraries. One example is a reference linking service that is demonstrated in figures 4-7. In figure 4 a user is searching in OVID’s Current Contents-an abstract and indexing database-for journal articles on strokes. Having located a promising reference to Anatomy of Stroke, Part I, she wants to see the full text of the article. Clicking on the reference, she does (figure 5). If the user then sees a footnote or reference to something that he or she also wishes to read, clicking on that reference will pull up the full text of that article (figures 6-7). But links from Current Contents will not always lead to the full text of an article. In fact, the links can be made only if the article text is available under license at UC. In some instances, only the print edition is available, in which case the user may end up back at Melvyl, having to issue a request for an interlibrary loan.

Fig. 4. Reference linking from Current Contents

Fig. 5. Link found in Stroke

Fig. 6. Reference linking from a footnote in the
article in Stroke

Fig. 7. Link found in the Annals of Neurology

This linking utility is a particularly interesting model of a layered service. The CDL hosts technology that enables this kind of linking and uses that technology to ensure that it applies wherever possible to the electronic content that makes up the shared digital collection. But the shared digital collection does not constitute the sum total of electronic materials to which UC faculty and students have access. Campus libraries acting individually and in small groups also license or purchase electronic information over and above that which is available in the shared collection. To ensure that campuses can integrate the unique electronic materials that they hold, the CDL makes the linking technology that they maintain available to the campus libraries; these libraries in turn configure the linking service to include locally held online materials.

A Union Catalog of Finding Aids

A further example of a layered service is the Online Archive of California (OAC), a union catalog of some 7,000 finding aids that have been developed for library special collections and archives on UC campuses and more generally around the state. Bound up within the OAC are perhaps two enabling utilities. A technology infrastructure enables integration of disparate finding aids. Perhaps more interesting, the OAC as a project provided the guidelines, and in some cases the motivation, to campus and other collections to produce online finding aids in a format that could be integrated. A new service that integrates access to digital image surrogates for works of art and architecture may have a similar effect and help UC’s libraries and museums make hundreds of thousands of digital images available to the widest possible community. As with other utilities, this one is designed to enhance the local services that campus libraries can make available to their users. In this vein, we are exploring the development of tools that will enable libraries to configure the service to meet local users’ specific needs, for example, by adding local images to the collection, by integrating the image collection with other local holdings, and by building interfaces that ensure the image service as a whole integrates with local course management systems.

What Makes the Layered Service Model a Challenge

This brief review of the layered services that are available within UC suggests that there is nothing at all new about the service model. The great public utilities (electricity, gas, even water) have been provided on a similar model since the late nineteenth century. What is new is the application to library services of this layered model. Also new are the weaknesses in the digital library that the model’s development at UC has revealed, and it is to these challenges that the paper turns.

Figure 8 depicts schematically and somewhat abstractly the current digital library service model. It shows star shapes toward the top of the picture to represent library Web sites where users come to find a host of materials (online public access catalogs, online journals, online databases, etc.). Libraries construct the Web sites for their users. They make reference to a wide variety of information resources represented as oval shapes toward the bottom of the picture. These information resources may include

catalogs of materials that are available locally in print and other analog formats (e.g., through online public access catalogs and finding aids);
online materials that are available to local users under licenses and that may be managed by third parties (e.g., online journals and reference databases); and
freely accessible Internet-based materials that are accessible through the library Web site and may be hosted anywhere in the world.

Fig. 8. The current digital library service model

Because information resources are built differently in a variety of places, by a variety of people, and to serve a variety of means, the library has to work quite hard and often in very proprietary, ad hoc ways (demonstrated by differently depicted arrows) to include them in its Web site. The model is enormously ineffective and inefficient. Take the library’s integration into its Web site of online journal content as an example. Operating at the content layer (represented by ovals), journal publishers have produced a host of different products, each of them aggregating or assembling in one place a particular collection of journals. Although the aggregations tend to focus in particular subject areas and can be quite large, they are only a very partial representation of the available journal content. Rather than look exclusively at one publisher’s collection of scientific journals, for example, the library user wants to look across a host of publishers’ science offerings. To support this research, the library is forced to combine, in a single Web site, a wide variety of journal collections, linking collections wherever possible by using the reference-linking technology discussed above. In effect, the library spends considerable energy in disaggregating the publisher aggregations so the journal content they contain can be more useful. Further, the library is charged doubly for its inconvenience. It pays a premium in subscription costs for the so-called value-added services that publishers claim they add by aggregating content. It then pays again to support the reference-linking technologies that allow it to unbundle aggregations so that the materials become more useful.

Many journal publishers have recognized the burden that the model imposes and have organized themselves through CrossRef so that they universally support network protocols that enable cross-collection linking. Unfortunately, the hard lessons learned are apparently not having any influence over those monograph publishers who are beginning to make some of their backlists available online. Once again, we see the publishers’ insistence on aggregating online content in ways that make little sense to library users, who typically want unfettered access to a range of information products. Indeed, the model emerging with electronic monographs may prove to be more flawed than that which is only now being transformed in the journal market. At least the journal publishers went out of their way to aggregate content by discipline, including in any one aggregation the journals of many different academic societies and, sometimes, publishers. With online monograph collections, the organizing principle that is most commonly in evidence seems to be by publisher (and perhaps, within publisher, by subject).

Commercial electronic publishers are not the only or even the worst offenders. Libraries that produce their own digital collections (for example, by scanning selected special collections) do so in a way that makes it extremely difficult for others to federate and integrate those collections with one another and with the more foundational holdings of printed and electronic monograph and journals. Have we, too, developed content that is so distinctive and ad hoc in its local orientation that it forces others who want to use it to go through the same unbundling process that commercial journal and monograph publishers force upon us?

A more rational digital library model is depicted in figure 9. The model proposes that we (publishers, libraries, anyone who builds digital information content) develop digital content and distribute it in open repositories. The repositories are “open,” not because they are freely accessible (the model doesn’t prejudice business decisions) but because the digital objects they contain (whether they are encoded texts, digital images, digital sound or film, statistical databases, or geospatial information systems) can be accessed, transformed, combined, and recombined with objects drawn from other collections by bona fide users according to their needs and interests. The model does not constrain the journal or book publishers, or even the digital libraries, from aggregating content in their own unique ways or distributing it with their own look, feel, brand, and functionality. They will and they must continue to build “higher-level” end-user services based on the content they supply. The model simply suggests that perhaps others will want to develop different higher-level services supporting needs and uses that the content owners cannot envisage. The model also forces us to think creatively about what kind of higher-level services might materialize if digital information content is available in open repositories. At present, the only higher-level services that we know are union catalogs (which integrate access to information about holdings of different online information resources) and, more recently, linking services as discussed above. Although there is a great deal more to do before we can claim to have perfected these kinds of services, we might still want to ask whether there are others that we have not yet thought of.

Fig. 9. A layered approach

What about alerting services, which indicate to users that something in their field of study has just become available from one or other source? Or format-based services that integrate access to and encourage use of online maps or space data? What about authoring tools that allow users to weave an interpretive web around digital objects (online journals, encoded books, manuscript images, databases) that are found in a variety of different open repositories and to present the interpretive web as an interactive lesson in support of online learning? Can we surface online library information in a manner that allows it to be integrated selectively into online learning materials, whether the materials are developed in Blackboard, WebCT, or some proprietary system? The answer, sadly, is no. At the University of California, this translates financially as follows: the $240-million annual investment that UC makes in its libraries is not available to the $170-million investment that it makes in instructional technologies. And UC is by no means unique in this.

There are other challenges. Even if we do adopt a layered model and put at its foundation a range of open digital object repositories, we are uncertain about how best to manage our digital content. What we do currently is perhaps best exemplified with reference to the many lives of a digital image surrogate for a work of art. Let’s say that a library wishes to develop an online finding aid to assist users interested in accessing its slide library. That library might include for every record in the catalog a thumbnail of the image of the slide in question. The thumbnail image is produced, included in a catalog record, bundled into a database management system that is useful for cataloging, and made accessible through a range of search-and-retrieval functions that are appropriate to a catalog. If the same library wants to include images that are available from the slide library in, say, an online collection of works by German expressionists, it will create an altogether different image (probably at higher resolution), bundle it along with some descriptive data in an altogether different content management system (e.g., as appropriate to an online image service), and make it available through a variety of search, retrieval, slide-table, and other functions as specifically appropriate to such a service. Then let’s assume that a teacher who is presenting a class on a particular German expressionist wants to create some online learning materials utilizing some of the same digital images that are now available both in the catalog and in the image service. She will in this case have to reproduce the digital image and include it along with any descriptive information in an entirely separate content management system, this one providing the functionality as appropriate to online learning materials.

The model, depicted in figure 10, relies upon proliferation of parallel and independent services, each with its own data ingest, data management, and data delivery schemes. It doesn’t scale. Every time the library wants to use a single digital object (whether an image, a graph, a map, or a text) in a new way, it is almost forced to build another vertical and independent silo of infrastructure and technology around it. That’s pretty silly. In the more rational model depicted in figure 11, the library’s digital images are managed in a single consistent format as part of one or several open image repositories that are constructed in a way that supports very different users of selected digital images. This is where we think we are going at UC, as at many other research libraries, and we are going in this direction because the parallel model (figure 10) is so uneconomical.

Fig. 10. Content management: the parallel service model

Fig. 11. Content management: a layered model

Conclusion

The layered service model also forces us to think differently about organizational issues. In it, we give up on any understanding that the content producer (the entity responsible for the open digital object repository) can know all the various ways in which the digital objects they produce will ultimately be presented and used. Abstractly, the repository cannot predict the range, complexity, or functionality of the higher-level services that are built on top of it. The question then becomes how to build a repository so that it can support a virtually infinite array of unknown higher-level services. Any answer to this question will undoubtedly be technical, but it will necessarily include organizational and political aspects as well. In a layered model, the success of those building open digital object repositories will be tied directly to the success and visibility of those building higher-level services based upon them. The promise that the model holds for libraries is compelling.

Today, we heard talks by people from public, national, and research libraries. Many of us have talked about the wonderful independent services we have created, services that array themselves in parallel to one another and comport themselves according to some organizational independence. Perhaps in two or three years we will return and speak in a different way. Then, perhaps, public librarians will speak eloquently about the services they have built for a local community on top of the collections offered up by research and national libraries. And the research librarians might in turn speak with passion about how they are delivering their collections through services developed by civic libraries and by schools-services that are tailored to specific user communities and user needs. Is it possible that a layered service model permits an organizational division of labor through which a variety of organizational entities, each playing different functional roles, are equally empowered?

FOOTNOTES

¹The 10 campuses are Berkeley, Davis, Irvine, Los Angeles, Merced, Riverside, San Diego, San Francisco, Santa Barbara, and Santa Cruz.

²U.S. libraries manage somehow to acquire some 650,000 books annually using endowment and other funding. Even at that rate, they are unable to keep pace with the rate of publication.

³Payments are made according to a prorated formula that is worked out and agreed to by the campus libraries.

⁴These themes are more fully developed in Daniel Greenstein, “Library Stewardship in a Networked Age: The Compelling Logic of Shared Collections,” in Redefining Preservation in the Twenty-first Century, edited by Abby Smith. Forthcoming.