Study Design and Data Collection • CLIR

There are significant differences in the organization and operation of periodicals activities across libraries. In designing our data-collection approach, we worked closely with a number of the participating libraries to find ways to build upon commonalities and accommodate differences. In this section, we summarize our approach to data collection.

Units of Analysis

Like Montgomery and King (2002), we were interested in serial literature, not monographs or other types of publications. Within the serial literature, we decided to focus on periodicals. To harmonize data across all libraries with relative ease, we used a widely accepted definition of periodical: “a serial publication that contains separate articles, stories, other writings, etc., and is published or distributed generally more frequently than annual.” This definition excludes annual reports and yearbooks; updates of databases, loose-leafs, and Web sites; monographic series; and newspapers.¹⁴

Libraries divide their print-periodicals operations into two categories: current issues and backfiles. Current issues are accessible individually, generally in a reading room, for the first year or two following publication. Then, at the libraries in this study, they are generally bound into volumes and stored in stacks.¹⁵ We refer to the two divisions of the print format-current issues and bound backfiles-as holdings categories, and we collected data separately on each category.

Electronic periodicals are generally stored on a server computer maintained by a publisher or an aggregator, although at some libraries, certain electronic periodicals are stored locally or on a consortial basis. The distinction between current issues and backfiles is not always as clear in the electronic format as it is for print. We therefore collected data on the electronic format as a whole and included it as a third holdings category.

Within these two formats and three holdings categories, we needed to develop units of measure that would allow us to compare costs-i.e., dollars per unit. This was complicated, because the units had to be similar across the electronic and print formats.

The electronic environment has given rise to business practices among those who sell access to electronic periodicals that make it hard to count and compare practices and holdings from one institution to another. In this regard, the phenomenon of the “serials aggregator” needs some explanation.

The simplest kind of aggregator is the publisher that bundles a package of periodicals to sell at a single price. Such publishers argue that the purchaser gets a larger collection of important journals at a better price than would be possible if titles were sold individually. There are, to be sure, some economies of scale for the publisher in not having to manage individual subscriptions. Delivery of the information is easier than in buy-by-the-title models.¹⁶

The purchaser, on the other hand, may question whether all the journals that have been added to the package are ones she would have otherwise wanted and so may wonder whether the price is as good as it is touted to be. The purchaser’s dilemma is that of the customer at a restaurant that offers an ? la carte menu as well as a one-price buffet. The restaurant will insist that the buffet is a bargain, but the customer may doubt whether the vat of peanut butter and the towering stack of sliced bread add much value and may come to a different calculation of cost and benefit.

If a publisher comes to a library that currently subscribes to 100 of its titles in print and offers an electronic package of 200 titles for 120% of the original price, a reasonable library may indeed choose the new package. But is that library subscribing to 200 titles? To the 100 titles that were previously judged to be worthwhile? Or perhaps licensing one collection, since some of these packages boast interoperability, linking, and searching and are marketed under a single brand name?

For the electronic format, subscription, issue, and title may no longer be meaningful descriptors-in the example above, 200 titles is not the ideal measure. Other reliable units of measure, however, have not yet come in to common use. For example, while we considered examining the total number of licenses to electronic collections as one alternative unit, even greater variability prevails in what is licensed and in the size of the collections. Moreover, licenses are not directly comparable. Given these considerations, we chose “titles” as the unit of measure for the electronic format. We defined this as all titles to which a library provides access, regardless of whether they are cataloged at the title level. This definition was intended to include titles that are licensed or accessed individually as well as those that are part of an aggregation. A title that is licensed twice-for example, through each of two aggregators-would only be counted once.

For print current issues, we also used “titles” as the unit of measure. Another choice would have been “subscriptions,” since libraries sometimes have more than one subscription to a given title. But by dividing total costs by the number of titles, we were able to better compare print with electronic. One effect of this choice was to assume, in our eventual comparison of print with electronic, that the transition of a given title from print to electronic format will result in the elimination of all print subscriptions to that title.

For backfiles, we used the number of bound volumes that the library held as the unit of analysis. Some libraries were able to provide good estimates of this number; in other cases, we used standard conversion measures to calculate the number of volumes from the number of square or linear feet occupied by the collection.¹⁷

Participating Libraries

Our dataset included data related to the nonsubscription costs of periodicals from 11 academic libraries. Drexel University permitted its mostly pre-existing data to be used within a modified methodological approach. Coauthor King was independently organizing a somewhat similar study at the University of Pittsburgh (Pitt), which agreed to permit the use of its data in this study. In addition, we collected data directly from nine libraries: Bryn Mawr College, Cornell University, Franklin & Marshall (F&M) College, George Mason University, New York University (NYU), Suffolk University, Western Carolina University, Williams College, and Yale University.

In recruiting library participants, we sought diversity in terms of size, affiliation, and degree of commitment to electronic resources. For the purposes of comparative analysis, we have categorized these institutions, on the basis of their Carnegie Classifications, as small, medium, and large (see table 1). More information on the size of these library collections and their operations may be found in the section entitled Periodical Operations and How They Are Changing.

Table 1. Participating libraries, by size

A number of the participating institutions are relatively decentralized. Professional schools often administer their own libraries. All of the institutions whose libraries are classified as “large” have more than half a dozen library locations on their campuses (and three have more than a dozen). Consequently, several participants chose to collect data only for certain units, avoiding some of the school or departmental libraries. Table 2 shows the parts of each library system that participated in this study.

As noted in table 2, some large medical, science, and law collections were excluded from the study. Many of the periodicals in such collections are very lengthy, in terms of numbers of issues and pages per year. One implication of excluding these collections from the study is to reduce the average cost of binding and storage space for the print collections. Another implication is that we may have excluded copies of print subscriptions that are duplicated at collections not included. This may also have the effect of reducing the cost of print at libraries that have significant duplication across print col-lections that are and are not included in our data. For both of these reasons, the omission of certain collections led us to underestimate the print costs in the life-cycle analysis for Cornell, NYU, Pitt, and especially Yale.

Table 2. Collections under examination at each participating library

Science collections may have other unique features that would have implications for circulation and reference services in the print format and across the board for electronic. We have no reason to believe, however, that such differences would have any significant effect on the cost comparison.

All the library collections included in this study have open stacks. A library such as the General Humanities Center of the New York Public Library, which has closed stacks, would presumably have higher print-related costs. Similarly, any special collections that had closed stacks, even if the main library collection were open stack, would presumably have relatively higher costs.

Finally, with the exception of NYU (as noted in table 2), the collections under examination at each institution were identical for both print and electronic formats.

Data Collection

Data collection took place during the first half of 2003. Staff contacts at each library gathered institutional statistics and distributed activity logs to all library staff who spent any amount of time on periodicals-related activities. The activity logs required staff to report the amount of time they devoted within a specified time period to each of 15 periodicals-related categories, segmented by holdings category, for a total of 45 possible activities.¹⁸ With one category excluded (explained below), the 14 categories of data included in this report were as follows:

collections development
negotiations and licensing
subscription processing, routine renewal, and termination
receipt and check-in
routing of issues and tables of contents
cataloging
linking services
physical processing
stacks maintenance (including current issues areas)
circulation
reference and research
user instruction
preservation
other

Some cost categories are not included, but we do not believe their absence meaningfully affected our results. Most important, we excluded from our analysis the costs of electronic infrastructure and support. We did so only after careful consideration. These costs are difficult to allocate directly to periodicals in general and to print or electronic periodicals more specifically. Although most of the libraries in this study were unable to allocate these costs directly, it was possible to develop estimates for three schools-Drexel, George Mason, and Pittsburgh. In these cases, including the electronic infrastructure costs did not affect the direction our findings, although there were varying effects on the degree of the cost effects. An analysis of the data from these three institutions, as well as the implications for our findings, may be found in Appendix A. Because we could not develop estimates for all the participating libraries, we chose to exclude the electronic infrastructure costs from all the data that we present. Likewise, we did not attempt to collect data on interlibrary lending and borrowing.¹⁹

We also collected, on a confidential basis, information about staff compensation, which eventually allowed us to associate dollar costs with specific activities. Appendix B shows the data-collection instruments, including the list of included activities and definitions of each, the staff activity log, and the institutional survey.

Because we needed to collect a substantial amount of data, we tried to be as flexible as possible in allowing participating libraries to provide information in ways consistent with their existing practices. This flexibility had two notable implications.

First, some libraries preferred to collect data for a recent month, while others felt it was best to provide data from the past year.²⁰ Because we wanted to allow each library to choose the method that it believed was most efficient and effective, we developed a mechanism to scale up monthly data to an annual form. For most activities, this mechanism relied on one of a variety of output-driven ratios.²¹ When it was more appropriate for a given activity, however, we assumed that each week’s work constituted 1/52nd of the year’s total work.²² All data in this report are presented in annualized form.

Second, we preferred that staff data be provided anonymously to avoid the possibility that managerial review might skew an individual’s willingness or ability to provide accurate time allocations. While most of the libraries felt comfortable with this approach, three felt it was not appropriate for them (NYU, Suffolk, and Yale). We do not believe that this difference had any meaningful impact on the data supplied. We put into place a system that allowed us to monitor the return of logs and to ensure that none went missing. We wanted to find an appropriate balance between collecting every staff survey, encouraging accuracy and honesty in responses, and respecting the participating libraries’ campus culture.

Once the data had been collected, three processing steps were implemented for staff-activity information, all of which were performed both by library and by holdings category. First, we merged the time allocations of individual staff to determine the total time expended on each activity at each institution. Then, as necessary, we annualized these time allocations. Separately, we used the salary data to determine the actual cost of each activity performed by each staff member. This entailed allocating the implicit cost of nonproductive time (vacation, breaks, lunch, and so forth) for the given staff member on a proportional basis to each activity, as well as loading in benefits. We did not include library or institutional overhead; however, the direct attributable managerial costs were included in the survey and are reported in our analysis.

Once staff costs had been calculated, we added nonlabor costs. Most of these-for example, the cost of binding vendors-were fairly straightforward. But when it came to the cost of space, we departed from our usual practice of using actual costs.

It was difficult for most libraries to calculate the cost of space occupied by periodicals in their mature library buildings, since data were unavailable or the effects of inflation were difficult to determine, or both. In some cases, renovations complicated matters significantly. Also, there were substantial differences in the location and design of participants’ library buildings, making individualized estimates difficult to compare. To resolve this problem, we determined a conservative standard for the cost of space and imposed it across the board, identifying one cost for current issues and another for backfiles.

Because several of the libraries had in recent years opened off-campus high-density shelving facilities (or begun to participate in consortial arrangements that provide such space), it seemed that for them (and eventually for many of the others) a new backfile volume accessioned would be shelved off campus or would displace an existing item to the off-campus facility. The cost of space in such a shelving facility would therefore be a reasonable proxy for the cost of space for all backfiles. In reality, backfiles today are usually shelved on campus, so, in using the off-campus space for these calculations, we derived figures that were more conservative than the actual costs of the space generally occupied by backfiles.

To determine the cost of storage space for backfiles, we gathered data from several recently constructed off-campus high-density facilities. Some of these cost data were available publicly and some were provided confidentially.²³ We estimated the average one-time construction cost in today’s dollars to be approximately $2.50 per volume.

Unlike backfiles, current issues of print versions would be expected to be shelved on campuses into the future. They are generally housed in browsable shelving areas, often in comfortable reading rooms.

For current issues, we created a cost estimate based on numbers reported by several of the participants. We believe that these figures are too low, because, among other things, they do not account for inflation. The estimate used for the construction cost of space for current issues was $100 per square foot. Estimates in the past several years for construction costs of new library space have averaged about $250 per square foot.²⁴

Fig. 1. Number of current periodical titles, by format, by library

Although we believe that these conservative estimates of space costs are appropriate for the purposes of this study, we also include, at various places, estimates of the costs assuming newly constructed on-campus space at $250 per square foot. We distinguish these estimates clearly wherever they are used. We amortized all space costs over a 25-year period.

FOOTNOTES

¹⁴ This is the 006 code for Type of Continuing Resource, which appears in OCLC’s Bibliographic Formats and Standards, Third Edition, available online at http://www.oclc.org/bibformats/pdf/ffe.pdf, at page 73.

¹⁵ Less often, issues are discarded and replaced with microform editions. We did collect data on the microform category; however, the quality of this information was insufficient for analysis. Since so few of the periodicals are held in microform, we did not present these data in this study.

¹⁶ Third-party vendors also engage in aggregation. Typically, they go to publishers that produce one or a few journals-small learned societies with a single title, for example-and offer to help them reach their markets. Then they turn to the library market and offer a package of journals more easily acquired, tracked, and managed for being in a single package than would be the case if the library went from publisher to publisher each year renewing subscriptions. A variant on this model is the aggregator that selects a package in different ways designed to add value: offering access to research articles only in a basket of journals that publish a variety of kinds of information, for example, or identifying a small group of related subjects where a thematic bundle seems to make market and intellectual sense. Here again, publisher and purchaser may disagree over the value gained by the bundling.

¹⁷ See Leighton and Weber 1986.

¹⁸ In isolated cases, survey respondents did not distinguish adequately between the holdings categories for a given activity. In these cases, we allocated the time among the holdings categories via imputations based on other staff in the same department at the same library. This was only rarely necessary.

¹⁹ When initiating a borrowing request, a patron does not understand an item to be missing from the local print materials or from the locally provided electronic materials, but simply from the periodicals collection as a whole. Consequently, it is not possible to allocate interlibrary loans (ILLs) by format or holdings category. ILL costs do not affect the relative costs of the formats and were therefore excluded from the study.

²⁰ Bryn Mawr, Cornell, George Mason, NYU, Suffolk, and Williams provided data in the monthly format; the other libraries did so by the year.

²¹ The activities handled in this way included negotiation and licensing, receipt and check-in, routing of issues or tables of contents, cataloging, physical processing, circulation, user instruction, and preservation. For those activities, we constructed a ratio of the number of “outputs” per month to the number of outputs per year, where outputs could be, for example, the number of volumes that were circulated. We used this ratio to determine how that activity for the given holdings category scaled to the year. These ratios, however, did not necessarily apply to all formats or to activities when the monthly data were not provided or appeared inappropriate (e.g., instances when monthly outputs exceeded previous annual outputs).

²² This was the case for activities for which outputs would be inaccurate measures of work, including collection development, subscription processing, routine renewal and termination, linking services, stacks maintenance (including current issues areas), reference and research, and electronic infrastructure and support.

²³ One useful source in seeking data on contemporary expenditures for off-site facilities can be found as Appendices 1 (capacity figures) and 4 (construction costs) of Reilly 2003. For more information on these types of facilities, see Nitecki and Kendrick 2001.

²⁴ See, for example, Jay Lucker, personal communication to Sarah Levin, in Bowen 2001.