|

next
section in this report >> | previous
section >> | report
contents >>
APPENDIX 2
Profiles of the 12 E-Journal Archiving Initiatives
All data in the following summaries were current as of
July 1, 2006
Canada Institute for Scientific and Technical Information
The National Research Council of Canada (NRC), Canada's governmental
organization for research and development, hosts the Canada
Institute for Scientific and Technical Information (CISTI),
a major source for information in all areas of science, technology,
engineering, and medicine. CISTI became the National Science
Library in 1957.
CISTI has a key role as leader and catalyst in building universal,
seamless, and permanent access to information for Canadian
research and innovation. To help achieve this vision for Canada,
CISTI has established a three-year program called Canada's
scientific infostructure (Csi). This program will create a
national information infrastructure and opportunities for collaborations
with partners to support research and educational activities.
Using a leading-edge architectural approach, CISTI has built
a reliable technology platform with expandable storage capacity
that ensures long-term access to digital content loaded at
CISTI. CISTI is partnering with Library and Archives Canada
(LAC) to ensure business continuity for the infrastructure.
With the infrastructure in place, CISTI has loaded close to
5 million articles from publishers NRC Research Press, Springer,
and Elsevier. New content from the Institute of Physics, Oxford
University Press, the American Society for Microbiology, Mary
Ann Liebert, and Emerald will be added to increase the depth
and breadth of the repository.
As part of the Csi program, CISTI is negotiating with publishers
for rights to make content accessible to customers and partners.
To ensure that access is as seamless as possible, CISTI is
implementing SFX to support bibliographic linking and is investigating
best options to support authentication and authorization in
a digital environment. CISTI is also conducting research in
the areas of text and data mining and text analyses for future
implementation.
LOCKSS Alliance
The Lots of Copies Keep Stuff Safe (LOCKSS) program began
in 1999 as a research project based at Stanford University
Library. LOCKSS launched the beta version of its open-source
software to 50 libraries between 2000 and 2002. LOCKSS developed
its software to allow libraries to collect, store, preserve,
and provide access to their own, local copies of authorized
content they purchase. The LOCKSS Web site1 lists
about 100 participating institutions in more than 20 countries
that are using the LOCKSS appliance to capture content. About
25 publishers of commercial and open access are participating
in LOCKSS not counting the individual publishers represented
by aggregators such as HighWire Press and Project MUSE, and
LOCKSS's own Humanities Project.2
In 2005, LOCKSS launched the LOCKSS Alliance as a membership
organization that is built on the LOCKSS software to introduce
governance for the program and to address sustainability issues.
The LOCKSS Alliance is an open membership organization. Members
have equal rights and responsibilities, though membership fees
are based on an institution's Carnegie Classification. LOCKSS
Alliance membership benefits include participation in collection-development
activities (including publisher briefings); early access to
LOCKSS documents, documentation, and prerelease software; access
to implementation collection and technology workshops; involvement
in community planning efforts; and access to the LOCKSS program
staff.
The LOCKSS Alliance assures its members of access to participating
publisher content, if the member has licensed or purchased
that content. Libraries manage their LOCKSS boxes to include
all the licensed content to which they wish to ensure long-term
access. Libraries can also negotiate with publishers that are
not participating in LOCKSS. Participating publishers may choose
to prevent the collection of new content, but they cannot withdraw
content that was previously ingested.
The LOCKSS appliance, an open-source software application,
is the core of the LOCKSS program and the foundation for the
LOCKSS Alliance. The appliance uses Web harvesting to capture
content from participating publisher websites. To participate
in LOCKSS, a publisher grants access to libraries to collect,
preserve, and provide access to the content and grants access
to the LOCKSS software to crawl, collect, and preserve the
content by adding a Web page called a LOCKSS publisher manifest.
The LOCKSS appliance has rules for monitoring, mediating, and
repairing on the basis of the results of this continuous polling
of the content.
CLOCKSS
The CLOCKSS (Controlled LOCKSS) initiative is a 2006 addition
to the LOCKSS program that brings together 6 libraries (Edinburgh
University, Indiana University, New York Public Library, Rice
University, Stanford University, and University of Virginia)
and 12 publishers and learned societies (American Chemical
Society, American Medical Association, American Physiological
Society, Blackwell Publishing, Elsevier, Institute of Physics,
Nature Publishing Group, Oxford University Press, Sage Publications,
Springer, Taylor & Francis, John Wiley & Sons, Inc.) to establish
a large-scale, dark archive for e-journals. The libraries participating
in CLOCKSS are also participants in the LOCKSS Alliance. Each
library will host two servers, creating a network of 12 dark
repositories.
CLOCKSS is a limited-membership organization that is holding
assets on behalf of the broader community. CLOCKSS systems
will harvest content by Web crawling and ingest source files
provided by publishers. Access to CLOCKSS content will be made
available to the community following an access trigger event.
The CLOCKSS system will automatically detect the cessation
of online access from the publisher and, if the content remains
unavailable for six months, the governing board (made up of
libraries and publishers) will work collaboratively to determine
whether content will be made available to the community for
a limited or indefinite time. "It's like a barn raising," Gordon
Tibbitts, president of Blackwell Publishing's American division,
said of CLOCKSS. "We all know we have to have the barn, so
we're calling everyone together to build it" (Kiernan 2006).
During the two-year developmental phase, the CLOCKSS initiative
will also test the responsiveness of this distributed test
bed of content to various potential disasters and share the
results of these tests to contribute to the development of
global strategies for preservation.
Koninklijke Bibliotheek e-Depot
As the national deposit library for the Netherlands, the Koninklijke
Bibliotheek (KB) has the responsibility for preserving and
providing long-term access to Dutch electronic publications.
At first, the KB focused on Dutch publishers, but more recently
it has come to recognize that multinational publishers produce
academic literature, and, as a consequence, there is often
no longer a national library that is the natural repository
for the content the publishers produce. The KB, therefore,
has assumed the responsibility to acquire and preserve, in
conjunction with other repositories, the published scientific
output of the world, regardless of where it was formally published.
To meet that responsibility, the KB began planning for e-journal
archiving in 1993, started experimenting with e-journal archiving
systems in 1995, and conducted research and implementation
of an e-journal archiving system as part of the NEDLIB project
from 1998 to 2000. The current e-Depot was delivered in 2002
and is now fully operational: a fully automated system, dedicated
to long-term storage and large-scale archiving. The e-Depot
system has been made part of the general budget of the KB.
In addition, since at least 2003, the KB has been receiving
earmarked funds for the operation of the e-Depot system as
well as monies for research and development in long-term preservation.
Currently, those funds amount to €2 million a year.
The growth of content in e-Depot has been dramatic. As of
March 2006, the e-Depot contained more than 6 million digital
objects in about 6 terabytes of storage space. More than 3,500
e-journal titles are represented in the repository. Among the
prominent publishers that have signed archiving agreements
with the KB are
- Elsevier (1996, 2002)
- BioMed Central (2003)
- Kluwer Academic Publishers (now part of Springer) (2003)
- Blackwell Publishing (2004)
- Taylor & Francis (2004)
- Oxford University Press (2004)
- Sage Publications (2005)
- Brill Academic Publishers (2005)
- Springer (2005)
The KB's goal is to include in the e-Depot the journals from
the 20 to 25 largest publishing companies, which produce almost
90% of the world's electronic STM literature.
Because there is no legal deposit requirement in the Netherlands,
the deposit of material into e-Depot is managed through negotiations
between the KB and individual publishers. At a minimum, the
KB stipulates that there must be on-site access to all authorized
library users. The archiving agreement with BioMed Central
allows the KB to provide free remote access to more than 100
open-access journals. For non-open-access journals, the agreement
with publishers stipulates that in the event that a publisher
cannot deliver content for a long period of time, the KB could
deliver the journals on an interim basis to subscribers. If
a publisher should decide to stop providing electronic access,
the KB could, if it so chooses, provide access to the world.
Thus, while the e-Depot system is not primarily an access system,
in an emergency the e-Depot could in theory provide access
to users around the world—assuming sufficient funds to do so
were available.
After receipt, ingest, and storage of electronic files from
the publishers, the KB follows two technical approaches to
long-term digital preservation. The first is migration: the
KB plans to transform digital objects to keep them readable.
The KB is also interested in emulation and has several projects
under way to see whether it can be used both to lower the cost
of preservation and to preserve the look and feel of the original
object. The KB continues to work with IBM, the vendor for the
e-Depot system, as well as partners from around the world,
to create the technical tools required for digital preservation.
Perhaps the most important component of the KB's approach
to digital preservation, however, has been the articulation
of the need for what it has called the "Safe Places Network."
The Safe Places Network will consist of a limited number of
places that make a substantial investment in the equipment,
skills, and expertise necessary to manage digital archiving
programs. Sharing the risks inherent in a digital archiving
system with a limited number of committed partners, it is hoped,
will reduce the cost of digital preservation.
kopal/ Die Deutsche Bibliothek
The Kooperativer Aufbau eines Langzeitarchivs digitaler Informationen
(kopal), is a cooperative project funded by the German Federal
Ministry of Education and Research. It began in July 2004.
Its goal is to develop an innovative technical solution to
the problem of long-term accessibility of digital documents.
Project partners Die Deutsche Bibliothek (DDB—the National
Library of Germany) and the Lower Saxon State and University
Library (SUB Göttingen) are storing a variety of digital materials
in a repository based on DIAS, the Digital Information and
Archiving System, developed by IBM and the National Library
of the Netherlands, the Koninklijke Bibliotheek, in The Hague.
The Gesellschaft für wissenschaftliche Datenvaraberitung mbH
Göttingen (GWDG) is in charge of the archive's technical operation,
with software support provided by IBM Deutschland GmbH.
One of the driving forces behind kopal has been the need of
DDB for a system for managing the legal deposit of electronic
publications. DDB had been experimenting with electronic journals
since 2000; in 2006, legal deposit legislation for electronic
publications was enacted in Germany, making the implementation
of a system a priority. Fortunately, as part of the initiation
of electronic legal deposit, DDB is getting a funding raise
of about €2 million to implement it.
As part of its preliminary investigations, DDB had, through
voluntary agreements with publishers, acquired a variety of
electronic content, including 455 e-journal titles from Springer
and many other e-journals from Wiley-VCH and Thieme. Under
legal deposit, DDB will start acquiring and adding to kopal
all electronic journals published in Germany.
DDB requires that publishers send to it compressed archive
files that contain the journal contents plus some rudimentary
metadata. At present, the intention is to maintain the readability
of the archived file; when necessary, the content will be migrated
into new formats. DDB has used emulation for some preservation
activities and will continue to do so.
Voluntary agreements with publishers in the past have allowed
for public access to the e-journals in the event of publisher
failure. This "access of last resort" may also be possible
with journals received via legal deposit. As yet, kopal has
not built public-access systems, and so it is likely that there
would be a significant delay between the collapse of a publisher's
delivery system and remote access to content in kopal. Nevertheless,
kopal/DDB is likely to serve as an important guarantor of the
long-term availability of e-journals published in Germany.
Los Alamos National Laboratory Research Library
Los Alamos National Laboratory (LANL) is one of three U.S.
national laboratories (the other two being Sandia and Lawrence
Livermore) operated under the National Nuclear Security Administration
of the U.S. Department of Energy. The Research Library at Los
Alamos National Laboratory (LANL-RL) has been locally loading
licensed backfiles from several commercial and society publishers
since 1995. Focusing on titles in the physical sciences, the
library maintains the content primarily for the use of LANL
staff, but it also serves a group of external cost-recovery
clients. These include five U.S. Department of Energy laboratories,
nine members of the U.S. Air Force Library Consortium, Sandia
National Laboratories, Santa Fe Institute, and five universities
located in the western United States. LANL-RL's locally loaded
e-journals are also available to members of the public who
are on-site at the library during its regular hours. The titles
come from the following publishers:
- American Chemical Society
- American Institute of Physics
- American Physical Society
- Elsevier
- Institution of Electrical Engineers
- Institution of Electrical and Electronics Engineers
- Institute of Physics
- John Wiley & Sons, Inc.
- Royal Society of Chemistry (backfiles through 2004 only)
- Springer
Through its digital library initiative, the Library Without
Walls, LANL-RL has done substantial research and development
work on repository and digital object architecture for long-term
maintenance of electronic journal contents. In November 2004,
LANL-RL received a $750,000 grant from the U.S. Library of
Congress's National Digital Information Infrastructure and
Preservation Program "to support research and development of
tools that will help address complex problems related to collecting,
storing and accessing digital materials."
A major focus of the research-and-development (R&D) work at
LANL-RL has been the aDORe repository. aDORe uses a modular
architecture, and is based on the following standards (Bekaert,
Liu, and Van de Sompel 2005):
- MPEG-21 DID (Digital Item Declaration) to represent digital
objects
- MPEG-21 DII (Digital Item Identification) to identify
digital objects
- XMLtapes and Internet Archive ARC files to store digital
objects and constituent data streams
- OAI-PMH (Open Archives Initiative Protocol for Metadata
Harvesting) to harvest resources
- The OpenURL Framework to convey context-sensitive dissemination
requests
- Info URI to facilitate the referencing of information
assets under the URI allocation
LANL-RL is moving its main e-journal repository from ScienceServer
to aDORe and expects to complete the transfer by the first
quarter of 2007. Until then, it has to live with some of the
limitations of ScienceServer, including the inability to display
certain formats and partial lack of Unicode compliance. The
new architecture will be considerably more flexible and was
built with long-term preservation of digital objects in mind.
In particular, it provides an application-neutral, XML-based
means to store a wide variety of file formats while maintaining
a record of the infrastructure and tools needed to decode the
files through evolving digital environments.
Despite the emphasis on preservation in its R&D work, LANL-RL
does not offer e-journal archiving services to its external
cost-recovery clients. The fees paid by clients cover only
the cost of current access and do not provide for subsequent
access, even to backfiles, in the event of termination. However,
even beyond its digital repository development contributions,
LANL-RL's e-journal preservation efforts have important implications,
both for the LANL community and for the scholarly community
at-large.
First, LANL-RL has insured through contractual negotiation
that all acquired e-journal content can be perpetually archived.
Second, it has extended its R&D work into the area of trustworthy
and high-integrity transfer of e-journal content from publishers.
Since 2003, LANL-RL has been working with the American Physical
Society (APS) on a multiphase project that may lead to the
establishment of a fully synchronized dark-mirror site for
all APS publications wherein LANL-RL would become the worldwide
source for APS content in the event of catastrophic failure
of APS's primary servers. LANL is in various stages of negotiation
with other publishers to offer similar mirror and fallback
services.
LANL receives appropriations from the U.S. Departments of
Energy and of Defense, among other sources. The Research Library
receives funding out of the institutional overhead in those
appropriations. Researchers receiving grants are taxed for
institutional support, and a portion of those funds go to support
of the RL. Therefore, part of the RL's funding comes indirectly
from appropriations, though there is no explicit budget line
for RL operations, let alone for e-journal archiving or other
specific tasks.
This creates a certain amount of uncertainty regarding ongoing
commitments to e-journal archiving. LANL-RL's primary concern
is that the scholarly journal literature needed by its staff
continue to be available via an affordable and trustworthy
mechanism. If another source that provided sufficient functionality
emerged, it could decide to contract for the services instead.
On the other hand, LANL-RL was one of the earliest local loaders
of e-journals, and as a result of ongoing R&D, has continued
to offer LANL staff functionality not available elsewhere.
Another potential source of uncertainty is that LANL is undergoing
a major restructuring that could affect priorities and funding.
LANL is currently managed by the University of California (UC)
under contract to the U.S. Department of Energy, but over the
next year, operation of the laboratory will shift to a limited
liability corporation called Los Alamos National Security that
includes UC along with Bechtel National, Inc., BWX Technologies,
Inc., and the Washington Group International, Inc. How the
shift in management will affect the RL's operation is not yet
known.
National Library of Australia PANDORA
The National Library of Australia (NLA) established PANDORA
in 1996. PANDORA is an acronym for Preserving and Accessing
Networked Documentary Resources of Australia. PANDORA serves
"all Australians, present and future, and anyone with a research
interest in Australia." In addition to the NLA, the PANDORA
program includes nine national- and state-collecting agencies
across Australia that partner to populate and maintain PANDORA.
The NLA covers the infrastructure, and support costs for PANDORA
through appropriations.
PANDORA contains six priority categories of online publications,
including Commonwealth and Australian Capital Territory government
publications, publications of tertiary education institutions,
conference proceedings, e-journals, titles referred by indexing
and abstracting agencies, and topical Web sites. There are
1,983 journals represented in PANDORA, although not all are
scholarly or peer reviewed. The PANDORA Web site groups the
content into a broad range of subjects covering academic, cultural,
social, political, and technical topics. Apart from approximately
150 commercial titles, PANDORA contains publicly accessible
content. The commercial content of PANDORA is typically restricted
for one to three years.
The first version of the PANDORA Archiving System (PANDAS)
was released in 2001. The members of PANDORA use PANDAS to
gather content, which is stored on NLA servers using proprietary
storage software called DOSS. The NLA developed the PANDAS
software to support these workflows: identifying, selecting,
and registering candidate titles; seeking and recording permission
to archive titles; setting harvest regimes appropriate to the
content; gathering (harvesting) files; undertaking quality
assurance checking; initiating archiving processes; and organizing
access, display, and discovery routes to, and metadata for,
the archived resources. The PANDAS software manages administrative
metadata about titles that have been selected for archiving,
rejected, or are being monitored pending a decision; manages
access restrictions; schedules and initiates the harvesting
of titles; manages the quality checking and assurance process;
prepares and organizes harvested content for public display
through title entry pages and title and subject listings; and
provides operational reports. The PANDAS software that the
NLA developed to gather content will be made available as open-source
software soon.
OCLC Electronic Collections Online
OCLC launched Electronic Collections Online (ECO) in June
1997 to support the efforts of libraries and consortia to acquire,
circulate, and manage large collections of electronic academic
and professional journals. It provides Web access via the OCLC
FirstSearch interface to a growing collection of more than
5,000 titles in a wide range of subject areas, from more than
40 publishers of academic and professional journals. Libraries,
after paying an access fee to OCLC, can select the journals
to which they would like to have electronic access.
An important component of the ECO offering is its promise
of long-term accessibility to subscribed content. OCLC's agreement
with publishers ensures that it can continue to provide libraries
with access to any content to which the libraries may have
subscribed as long as the library continues to pay the access
fee. Even if a user discontinues an ECO access account, OCLC
will maintain the user's subscription profile for five years,
and if a user renews an access account before five years have
passed, the user can regain access to all the journals covered
by the previous subscription.
Although ECO has not established the "minimal set of well-defined
services" that would make it a "qualified preservation archives"
(Waters 2005), it has undertaken a number of steps that increase
the likelihood that it will be able to provide continued access
to the content it offers. For example, OCLC maintains a copy
of all journal content and the associated abstract and index
data in an off-site storage facility. It has also secured the
right to migrate journal backfiles to new data formats as current
formats such as PDF, which form the vast bulk of ECO content,
become outmoded. (OCLC has not as yet, however, had to migrate
any file formats.) ECO is not part of OCLC's Digital Archive
service and has no immediate plans to take advantage of OCLC's
"real-world solutions for the challenges of archiving and preservation
in the virtual world."
In the event of publisher failure or some other trigger event
that would prevent a publisher from delivering content to subscribers,
it is possible that subscribers might be able to shift their
subscriptions to ECO in order to secure access. This would
have to be worked out in negotiations with the publishers.
Should OCLC decide to stop offering the ECO service, it can
provide to libraries on tape or CD/DVD copies of any content
to which the library had subscribed. It would then be the library's
responsibility to mount that material and make it available.
OhioLINK Electronic Journal Center
The Ohio Library and Information Network (OhioLINK) is a consortium
of Ohio's college and university libraries, comprising 85 institutions
of higher education and the State Library of Ohio. OhioLINK's
electronic services include a multipublisher Electronic Journal
Center (EJC), launched in 1998, which contains more than 6,900
scholarly journal titles from close to 40 publishers across
a wide range of disciplines. Although several OhioLINK resources
are available to all Ohio residents (with some open to all
on the Internet), the content of EJC is available only to students,
faculty, and staff members at OhioLINK-affiliated institutions.
At this time, OhioLINK has neither the resources nor the legal
right to make the contents of EJC available outside of the
state of Ohio.
EJC is an optional service of OhioLINK, though the vast majority
of Ohio higher education institutions have chosen to participate.
The cost of joining EJC is determined by the institution's
current spending on journals from the publishers who are represented
in EJC, including print and electronic subscriptions. Most
institutions wind up getting electronic access to far more
titles than they previously were subscribing to for a similar
outlay of funds. The access mechanism is shifted from a campus-based
one through publishers and aggregators to one based on EJC.
EJC accepts most content as it is supplied by the publisher,
but is limited in the formats that can be displayed by its
main repository software, ScienceServer. The current version
of ScienceServer can display only PDF, TIFF, and some types
of XML. EJC intends shortly either to upgrade to a new version
of ScienceServer or move to different repository software.
Goals for the new software include expansion of the range of
file formats that can be displayed and resolving existing display
limitations caused by the lack of Unicode compliance in the
old ScienceServer.
OhioLINK has declared its intention to maintain the EJC content
as a permanent archive and has acquired perpetual archival
rights in its licenses from all publishers but one (the American
Chemical Society). Furthermore, in May 2006 the OhioLINK Governing
Board approved a series of recommendations that included a
commitment to seek the addition of a clause to all EJC contracts
that would extend liberal self-archiving and access rights
to all personnel of Ohio higher education institutions.
EJC relies on regular and heavy use by subscribers to help
maintain the integrity of its archive and reveal problems.
Though it anticipates having to perform file migrations in
the future, it has not done any yet. It does not normalize
incoming files. Instead, EJC relies on publishers to supply
files in one of the standard formats that ScienceServer is
capable of displaying. Content received from publishers in
other formats is retained, but will not be displayable until
the next-generation repository software is in place.
All technical infrastructure costs, as well as about 20% of
content-acquisition costs, are centrally funded though legislative
appropriations. The remaining funding for content comes from
member libraries. Fluctuations in state appropriations have
resulted in discontinuation of some titles. EJC's contracts
stipulate a nonpunitive approach to obtaining missing content
if EJC resubscribes to a canceled title.
EJC has been extremely popular and continues to experience
growth in usage. OhioLINK would like to expand EJC to include
publishers such as Sage, Taylor & Francis, Cell Press, the
Institute of Electrical and Electronics Engineers, GeoScienceWorld,
and titles from a number of scholarly societies. Some of these
acquisitions would fill gaps in disciplines such as nursing
and the biosciences that OhioLINK officials feel are currently
underserved. If funding can be found, OhioLINK also wants to
purchase backfiles for many titles as a means to increase access
and save member libraries money by reducing the need to store
print copies at multiple sites.
Plans include development of a Digital Resource Commons (DRC),3 with
which OhioLINK hopes to accomplish with a shared repository
environment what EJC and other OhioLINK components have done
with shared content. Instead of member institutions investing
the resources to create and manage their own repositories,
DRC would provide a centrally managed repository (based on
Fedora) with locally controlled infrastructure for ingest,
and a sophisticated, multilevel access rights management system.
According to OhioLINK, DRC "ingests, preserves, presents, and
mediates administration of the educational and research materials
of participating institutions." Capabilities envisioned include
an institutional repository for research portfolios such as
preprints, postprints, and working papers, electronic thesis
and dissertation management, and Web-mediated peer-reviewed
electronic journals with open access, self-archiving, and publishing.
Ontario Scholars Portal
Launched in 2001, the Ontario Scholars Portal (OSP) serves
all 20 university libraries in the Ontario Council of University
Libraries (OCUL) consortium.4 The
Portal includes more than 6,900 e-journals from 13 publishers
and metadata for the content of an additional 3 publishers.
The publishers currently represented include Elsevier, John
Wiley & Sons, Inc., Springer, Kluwer Law International, Blackwell,
Oxford University Press, Cambridge University Press, American
Psychological Association, Emerald, Berkeley Electronic Press,
Sage, Institute of Electrical and Electronics Engineers, and
the Royal Society of Chemistry.
The Portal uses a combination of "push and pull" to gather
content: publishers provide source files, and the Portal harvests
content from publisher Web sites. The Portal stores all the
content from publishers, but the current system cannot render
all the formats that have been stored, e.g., video files and
numeric data. Most of the content is in PDF or XML format.
The primary purpose of the Portal is access, but the consortium
has made an explicit commitment to the long-term preservation
of the e-journal content that it loads locally. The Portal
provides online access to the content that consortium members
have licensed or purchased. Members of the consortium are required
to pay membership fees and are represented on the executive
board of the Portal. Preservation is included in the e-journal
service to members.
Between 2001 and 2005, OSP was supported by a grant and provincial
matching funds as part of the Canadian National Site Licensing
Program.5 Ongoing
support for OSP relies upon a membership cost model that adjusts
for the varying size of consortium members and usage factors
and that includes tiered membership fees.
Portico
Portico is one of the newest of the archiving programs, having
just gone "live" in 2006 (although planning began in 2004,
and the preservation obligation was assumed in 2005). The mission
of Portico is to "preserve scholarly literature published in
electronic form and to ensure that these materials remain accessible
to future scholars, researchers, and students." Specifically
designed as a third-party electronic-preservation service,
Portico serves as a permanent dark archives. E-journal availability
(other than for verification purposes) is governed by trigger
events resulting from substantial disruption to access via
the publishers themselves.
The program's archival approach begins with the receipt of
source files, which comprise the intellectual content of electronic
scholarly journals directly from the publishers, and features
transformation or normalization of these diverse files to a
standard archival format that can be managed over time through
the preservation strategy of migration.
Portico boasts a strong pedigree, with startup funding provided
by The Andrew W. Mellon Foundation, Ithaka, JSTOR, and the
Library of Congress. A membership organization, it is open
to all libraries and scholarly publishers, both of which are
asked to support the effort through annual contributions. Thirteen
publishers are participating in Portico:
- American Anthropological Association
- American Mathematical Society
- Annual Reviews
- Berkeley Electronic Press
- BioOne
- Elsevier
- John Wiley & Sons
- Oxford University Press
- Sage Publications, Inc.
- Society for Industrial and Applied Mathematics (SIAM)
- Symposium Journals (Oxford UK)
- United Kingdom Serials Group
- University of Chicago Press
Recently announced library fees, ranging from $1,500 to $24,000
per year, are based on the total library materials expenditures
for an individual institution. To encourage early adopters,
libraries that subscribe to this service in 2006 and 2007 will
be designated "Portico Archive Founders" and will receive substantial
savings on their annual archive support payment for five years.
Library systems and consortia that facilitate support for the
archive among their member institutions will be offered modest
savings in their annual payments. According to Eileen Fenton,
executive director, Portico is aiming to attract additional
libraries from across the Carnegie Classification of Institutions
of Higher Education.6
PubMed Central
PubMed Central (PMC) is a free, publicly accessible digital
archive of English language biomedical and life sciences journal
literature, run by the National Center for Biotechnology Information
(NCBI) of the U.S. National Library of Medicine (NLM). Launched
in February 2000 with content from the Proceedings of the
National Academy of Sciences and Molecular Biology
of the Cell, PMC has grown to include hundreds of thousands
of articles from about 250 titles and 50 publishers.
Like the similarly named PubMed, PMC is an integral component
of NCBI's Entrez life sciences search engine. While PubMed
contains citations, abstracts, and links to full-text articles,
PMC consists of full-text research articles and other content
from peer-reviewed life sciences journals. The two services
are separate and not entirely complementary. PubMed points
to numerous articles that are not in PMC, while some content
in PMC (mostly nonarticle journal content) is not indexed in
PubMed.
PMC's mandate to preserve the journal literature of biomedicine
comes from the Congressional act that created NLM, which authorizes
it to "acquire, organize, disseminate and preserve books, periodicals,
. . . and other library materials pertinent to medicine." At
the moment, NLM cannot compel researchers to deposit their
publications in PMC, but authors of life science research sponsored
by U.S. National Institutes of Health are requested to voluntarily
deposit final manuscripts of articles into PMC within a year
of publication.
That situation may change, however. Legislation entitled the
Federal Research Public Access Act of 2006 (introduced in the
U.S. Senate on May 2, 2006) would require that U.S. government
agencies with annual extramural research expenditures of more
than $100 million make journal articles based on research funded
by that agency publicly available via the Internet within six
months. If the bill is passed, agencies in the U.S. Department
of Health and Human Services, e.g., NIH and the Centers for
Disease Control and Prevention, would presumably use PMC, since
the law requires that manuscripts be preserved in a digital
archive that supports free public access, interoperability,
and long-term preservation.
Other content comes into PubMed Central by a variety of mechanisms.
Some open-access journal publications (most notably the entire
set of BioMed Central journals) use PMC as their archiving
solution. Some commercial publishers that do not otherwise
have agreements with PMC allow authors to designate their articles
as open access and to deposit these articles in PMC. Finally,
a growing number of publishers have reached contractual agreements
with PMC to deposit all their journal contents with PMC.
To participate in PMC, a publication must be covered by a
major abstracting/indexing service, or have three editorial
board members with current grants from major nonprofit funding
agencies. Publishers are required to supply source files (via
FTP or on CD/DVD or tape) in either SGML or XML, conforming
to the NLM Journal Archiving XML DTD or another full-text article
DTD that is widely used in the life sciences. The original
high-resolution digital image files must be provided for all
figures. PMC prefers (but does not require) that publishers
also include a PDF version of their articles in the archive.
Publishers are encouraged to deposit the entire contents of
their journals for archiving, but must at minimum provide all
research articles. For display purposes, PMC performs an on-the-fly
conversion of stored XML to HTML.
PMC has a flexible deposit policy designed to accommodate
the desire of many publishers to delay appearance of journal
content in PMC for a period of time following publication.
Although publishers are encouraged to make content available
via PMC as soon as possible after publication, they may request
a delay of up to one year for research articles, and up to
three years for other content, such as letters and reviews.
NLM is committed to long-term stewardship of the content in
PMC. All contracts must include a clause granting PMC perpetual
archiving rights for any deposited material. Two operational
policies dominate PMC's approach to content longevity. One
is an emphasis on standardized XML, which is portable, maintains
document structure, and lends itself to intelligent processing
without sacrificing human readability. NLM is continuing its
work on the Journal Archiving and Interchange DTD from which
the Journal Publishing DTD was derived and for which the Library
of Congress and the British Library recently announced support.
The other is free, open access to all content, which, in concert
with automated processes, helps ensure the integrity of archived
content through direct, active, and continuous use.
NLM is also committed to expanding PMC. New publishers and
titles are being added regularly, and NLM has embarked on a
program of back-issue digitization for the titles that are
routinely depositing current content in PMC.
PMC is not identified specifically as a line item in the NIH
or NLM budgets. In October 2004, a review of personnel, contract,
and system (hardware/software) costs noted an annual cost of
$2.3 million. This included most operating costs for staff,
contract work, equipment, and software other than the cost
of digitization of journal back issues.
FOOTNOTES
1 http://lockss.stanford.edu/about/users.htm.
2 http://lockss.stanford.edu/about/titles.htm.
3 About the Digital Resource Commons, http://drc-dev.ohiolink.edu/.
4 http://www.ocul.on.ca/.
5 http://library.queensu.ca/libdocs/news/2001apr09.htm.
6 http://www.carnegiefoundation.org/classifications/.
next section in this
report >> | previous
section >> | report
contents >>
pub 138 abstract >> |