APPENDIX A:
Interviews with Librarians and Publishers
OhioLINK, Los Alamos National Labs (LANL), and the Florida Center
for Library Automation (FCLA) all host journal databases. They were
selected for inclusion in this white paper because they had to develop
the same capabilities being requested of publishers. Villanova University
was included because it has closed stacks for its bound journals,
which means that it has good measures of use. James Mullins, the
university librarian at Villanova, was on the task force that created
guidelines for the statistics that JSTOR delivers.
Academic Press, Elsevier, MCB, and the Institute of Physics (IOP)
host their own journals and have experience with collecting statistics.
The American Institute of Physics (AIP) and Association for Computing
Machinery (ACM) are in the process of developing this capability.
JSTOR and Catchword both host content from a variety of publishers.
JSTOR was part of the initial discussions about library requirements,
while Catchword is further developing its statistics capability.
Like the library hosts, these providers have a standard platform
that provides consistent data to enable comparisons.
Libraries
OhioLINK
Because OhioLINK staff developed the statistics capability when
they designed the overall system, its initial set-up costs are not
readily identifiable. Ongoing support is provided by two staff members
who perform many other duties.
In addition to issuing regular usage reports, OhioLINK has taken
advantage of the opportunity to perform further assessment of usage.
According to Executive Director Tom Sanville, this assessment shows
that, although every title in the database has been used, 40 percent
of the titles represent 80 percent of the downloaded articles, while
another 40 percent of the titles received only 10 percent of the
use (Sanville 2000). This prompted David Kohl, director of the library
at the University of Cincinnati and a member of OhioLINK, to suggest
that low usage might make the latter titles candidates for lower
pricing (Kohl 2000).
A surprising discovery is that more than half (58 percent) of the
articles downloaded for all OhioLINK libraries were not held in print
by the libraries (Sanville 2000). In each institution, patrons make
use of a much wider number of journals than those held in print.
This finding speaks to limitations imposed by budgets on the selection
process and the importance of letting the user choose from a larger
file of material. In a paper presented at Oxford 2000, David Kohl
noted that presenting users with a database of journal articles allows
them to drive the selection process in a way that is similar to the
current practice under which vendors supply librarians with books
on approval.
Los Alamos National Labs
The LANL Library has gone through three stages of development, according
to Director Rick Luce. The data the library collects depend on how
far it parses its log files. The first stage, which entailed parsing
UNIX logs and "beat code," cost $20,000 and required nominal
staff support. The second stage, which involved producing static
usage data on the basis of scripted code, cost $50,000; one staff
member performed this activity. The third stage, designed to enable
the user to perform a query and export the results, may cost $250,000.
Programming staff will be involved in doing the analysis.
LANL has 3,500 electronic journals available to its users; of these,
2,000 titles are loaded locally and 1,500 are accessed remotely.
When LANL did a trial with Elsevier, all titles in the database were
used and the participating libraries did not own the most-used titles.
| BGSU - Bowling Green State University |
CSU - Cleveland State University |
| CWRU - Case Western Reserve University |
KSU - Kent State University |
| MU - Miami University |
OU - Ohio University (Athens) |
| OSU - Ohio State University (Columbus) |
UA - University of Akron |
| UC - University of Cincinnati |
UD - Univesity of Dayton |
| UT - University of Toledo |
YSU - Youngstown State University |
| WSU - Wright State University |
Luce concludes that librarians do not know exactly what users need,
confirming the discovery process in research and the learning curve
in the electronic environment.
LANL enables its users to connect to full text from links within
secondary publications, from browsing selected titles, and from performing
subject searches. It takes six months for users to discover, remember,
and fully use a new service. Keys to success are to ensure that links
are established, to allow sufficient ramp-up time, and to promote
awareness. LANL has expanded its electronic holdings since 1995,
and user satisfaction with library services has increased dramatically.
Florida Center for Library Automation
FCLA is the central agency that supports the online catalogs of
the 10 universities in Florida. Like OhioLINK and LANL, FCLA loads
a number of full-text journal databases, for which it produces statistics
locally as well as links to publishers' remote sites.
FCLA would like to track the number of searches, the number of documents
retrieved, and the number of requests denied. The number of hits
is not a valid indicator of use because there is no consistent way
to measure them. The number of articles viewed by journal title is
counted when the PDF is viewed. Reports on usage of full-text journals
are updated nightly in a formatted report that the librarians can
download.
When users link to a publisher's database, they have effectively
left their home system. The library can tell which database they
linked to, but it cannot track actions taken on the publisher's Web
site. Consequently, libraries must rely on publishers for usage data
and then merge such information with their own local data.
Villanova University
Villanova University Library Director James Mullins noted that students
today rely solely on electronic publications because of their ease
of access and use; consequently, they have a limited view of the
available content.
Villanova can track the usage of its bound print journals because
they are in closed stacks. Use of print journal collections was growing
until 1995, when electronic databases were made available to users,
who also began to access the Web. Since then, the library has seen
a dramatic decline in the use of print materials and a steady increase
in the use of electronic resources.
In an attempt to collect some data locally on student and faculty
use of remote databases, Villanova analyzed its log summaries, which
show the total number of times a database is accessed. These data
are put into a spreadsheet as a frame of reference along with vendor-supplied
data and are compared with the prior year's totals. Assistant Director
for Public Services Louise Green noted that training usage should
be counted separately so as not to skew the totals.
Publishers
Elsevier
Elsevier has at least two staff devoted to managing usage data from
its ScienceDirect database installations. Most libraries subscribe
to only a portion of the 1,170 titles in Elsevier's database; therefore,
data on the use of nonsubscribed titles are helpful in considering
the addition of electronic or print versions of a title.
Although Elsevier is committed to providing as much information
as the customer believes is useful, staff acknowledge that custom
reports are not economical to generate. The company can see the impact
of marketing on journal usage, and it has a staff of account-development
managers devoted to training librarians and users on the system.
As the volume of articles used rises, the cost per use drops.
To keep current in their field, researchers scan about a dozen journals
regularly by browsing their tables of contents. This activity is
reflected in how the database is used when researchers select a journal
title from a list and then browse the tables of contents of various
issues, rather than search by subject, author, or title.
Elsevier has paid particular attention to global requirements for
a privacy policy, which appears on a full page on the Web site for
ScienceDirect. Some customized services, such as an e-mail address
for an alerting service, cannot be provided if the user does not
provide a minimal amount of personal information. To ensure privacy,
all data on individual users are scrubbed at the organizational level
before being processed and aggregated.
Academic Press
Academic Press found that the off-the-shelf software packages that
summarize hits do not provide the data that libraries need. It is
hiring a full-time statistician and measurement analyst to help address
the issue. The company experienced a dramatic increase in usage when
it introduced its new platform in the fall of 1999.
Data gathering is complicated because Academic Press's journal database
(IDEAL) is loaded on remote sites such as OhioLINK and OCLC, and
Academic Press needs to combine data from several sources for a complete
picture of usage of its own journals. Data are used internally by
sales, accounting, and editorial staff to examine correlations and
draw conclusions about the cost per-article for each institution.
This allows the publisher to understand how the library might equate
the cost per-article to a relevant measure indicating value.
In the print world, subscription revenues indicate the health of
a journal. When that journal is part of a database, the equation
changes completely, since some of the articles used were in previously
nonsubscribed titles.
For every 1.5 log-ins to the database, one article is downloaded,
and for every abstract viewed, there is one article downloaded. Academic
Press summarizes the total number of log-ins by journal and of articles
downloaded by journal each month for each institution and consortium.
Chrysanne Lowe, director of online sales and marketing, noted that
the journals that have the most articles downloaded are considered
the company's most successful titles. These are large journals with
many articles. The list of journals in greatest demand changes when
the number of articles downloaded is compared with the number of
articles published in the title.
Philosophically, Academic Press is opposed to a business model in
which charges increase with use because it discourages use. Academic
Press offers marketing support with promotional items and coordinates
training with librarians and faculty members.
MCB University Press
In addition to the normal data on time-of-day activity that help
it determine the load on systems, the system at MCB University Press
tracks hits and sessions. To learn how users come to the site, MCB
also analyzes the top referring sites, top browsers, top entry pages,
and the most popular and least popular pages in the database.
MCB University Press is interested in knowing which institutions
generate the most requests and which articles and journals are most
requested. How users search is also of interest; for that reason,
data on the tables of contents, search pages, and browse pages are
collected.
Heavy use of the tables of contents through the browse functions
indicates that many users know the title they wish to see. However,
MCB discovered that the most-used titles at some institutions were
the first titles in the alphabet. This indicates that users are learning
how to use a system and suggests the need to evaluate the interface
or provide more training.
Institute of Physics
Bridget Pairaudeau, producer of electronic publications at IOP,
just completed the design of IOP's statistics form for internal use.
It allows staff to select the following variables:
- Who: user files and the subscription records from IOP's
internal systems
- What: data from log files on the type of activity and
time frame
- View: display options, such as grouping subscribed journals
Users of the IOP system also have the option of creating a graph
by selecting elements for the x and y axes. If they
chose to graph usage of Web pages on both axes, they can show navigation
to full text from the table of contents compared with navigation
from the subject keyword search. Data on the use of options that
can be customized, such as profiling, use of filing cabinets, and
activating a table of contents alerting service, show which features
are most used.
The editorial and marketing staffs are interested in knowing which
articles and journals are most requested and which institutions are
most active. The sales department is interested in the level of use
by specific customers, and system designers want information they
can use to enhance features, navigation, and usability.
IOP screens out data on internal use, guests, free use, trials,
production applications, and robot attacks, because they can greatly
skew statistics. When IOP's internal data analysis did not match
that of the commercial package, staff discovered that NetTracker
counts HTML views but not PDF downloads.
American Institute of Physics
Doug LaFrenier, director of marketing at AIP, noted that the market
has changed dramatically. Providing statistical data to libraries
represents a new set of responsibilities for publishersone
that has associated costs. LaFrenier's primary concern is the lack
of standards, which makes it impossible to compare data.
AIP is concerned that it is undercounting because its system does
not count searches and requests for abstracts. It counts only requests
for the full text of an article that requires either a subscription
or pay-per-view access. At the same time, AIP has discovered that
one of the interfaces was triple counting downloads because of the
way it grabbed the content.
The American Institute of Physics, working with the American Physical
Society (APS) has devoted much of one full-time programmer's activity
to developing Web-based statistics that libraries can access for
their own use. The statistics, which will be available to other publishers
that AIP hosts, are planned for delivery early in 2001.
AIP demonstrated the system at the Special Libraries Association
2000 meeting. The demonstration showed year-to-date download statistics.
Libraries who attended this session persuaded AIP that libraries
want to be able to specify their own time periods. They also want
to be able to compare current data with information from prior years.
AIP found it difficult to identify who within the library should
have rights to view this information.
Previously, AIP had given its own publishing customers reports from
the server logs that summarize activity by journal title. The company
also has analyzed time-of-day performance data to support decisions
in running an online journal platform. It has been able to identify
the most active journals and accounts, and believes that much of
the information developed for online publishing customers will be
useful in developing usage-statistics reports for libraries.
Anyone using the AIP Web site has the option of buying an article
online. Sales grew significantly when the company simplified its
interface and reduced the number of steps required for the user to
obtain the article. This further supports the importance of ease
of interface on usage.
Association for Computing Machinery
The Association for Computing Machinery is evaluating what statistics
need to be collected. As staff experimented internally with data,
they found that the most frequently downloaded article in any given
month was neither a current article nor one they would have expected
to be so popular. High-use article titles provide clues for editors
about the topics in demand.
Providers
JSTOR
The ICOLC guidelines are based on those developed by a task force
in conjunction with JSTOR in 1997. JSTOR data are updated nightly
and can be queried and exported to a spreadsheet. Individual site
data can be compared with average data for all sites in the same
JSTOR classification and with summary data for all JSTOR titles.
Both publishers and librarians can sign on and retrieve data.
Data presented include the number of pages viewed, PDFs printed,
searches conducted, and tables of contents browsed. Since JSTOR includes
as articles all items (e.g., reviews and letters), it lists full-length
articles separately for clarity.
In a presentation at the Conference on Economics and the Usage of
Digital Library Collections, JSTOR President Kevin Guthrie observed
that the articles that are most often downloaded are not those that
advance research or that are most often cited (Guthrie 2000). "Value
needs to be clearly defined as libraries consider acquisition and
cancellation decisions for electronic content," Guthrie stated.
(Marthyn Borghuis from Elsevier noted that citations reflect author
activity while usage reflects reader activity).
The notion of perishability of content varies with the discipline.
The average age of the most-used articles was also surprising: 13
years in economics and 32 years in mathematics. When there are a
small number of total accesses for the discipline, the actions of
a few people can sway the results.
Guthrie cautioned that usage does not necessarily equate to value
in the research sense. "Older articles may be absolutely vital
to the continuation of high-quality scholarship and research in the
field, but that may not lead to extensive use," he said.
Catchword
Catchword delivers service that is paid for by the publishers, who
decide what information to share with libraries. Catchword is expanding
its statistics ability according to ICOLC guidelines, and it will
have data that can be used by libraries. Catchword has decided to
add turnaway statistics that reflect the number of times a user attempts
to access the full text of an article in a journal to which the library
does not subscribe. Catchword can also track pay-per-view access.
Although the company has a single source to produce these data, its
challenge is to summarize data from 11 servers around the world.
HighWire Press
HighWire Press has developed extensive data analysis and reporting
capabilities for publishers and librarians who can download their
report from HighWire to Excel. Journal usage data includes: statistics
on the volume of searches, table of contents, abstracts, articles
viewed in HTML, and PDFs downloaded. Demand for articles is measured
by: the number of unique articles and total accesses by abstract,
HTML views, and PDFs downloaded. It is also possible to see the top
ten articles in each journal ranked by total accesses (HTML, PDF,
abstract) with an indication of the age of the article. As part of
a Mellon-funded grant to Stanford University Libraries, HighWire
transaction logs will be analyzed using data mining techniques to
uncover user behavior and trends.
Links to other parts of this report:
Table of Contents
Report Text
Appendix B: ICOLC Guidelines
Appendix C: Related Industry Initiatives
References
Return to CLIR Home Page >> |