 |
Utility of the Archival Paradigm in the
Digital Environment
Information is not a natural category whose history we
can extrapolate. Instead, information is an element of certain professional
ideologies . . . and cannot be understood except through the practices
within which it is constructed by members of those professions in
their work.
Agre (1995)
The principles and practices discussed in the preceding section
demonstrate how the archival community constructs information and
why this construction needs to be understood and addressed in the
digital environment. These principles and practices, independent
of the archival construction of information, can also contribute
to the management of digital information. Implementing the archival
paradigm in the digital environment encompasses the following:
- working with information creators to identify requirements for
the long-term management of information;
- identifying the roles and responsibilities of those who create,
manage, provide access to, and preserve information;
- ensuring the creation and preservation of reliable and authentic
materials;
- understanding that information can be dynamic in terms of form,
accumulation, value attribution, and primary and secondary use;
- recognizing and exploiting the organic nature of the creation
and development of recorded knowledge;
- identifying evidence in materials and addressing the evidential
needs of materials and their users through archival appraisal,
description, and preservation activities; and
- using collective and hierarchical description to manage high
volumes of nonbibliographic materials, often in multiple media.
The archival community is making significant contributions to research
and development in the digital information environment by using integrity,
metadata, knowledge management, risk management, and knowledge preservation.
Each area is discussed below with reference to recent and ongoing
projects in which the archival community has played a leading role
in setting the agenda or integrating the archival perspective. Many
of the projects discussed have in common a concern for evidence in
information creation, storage, retrieval, and preservation; cross-community
collaboration; strategies that use both technological processes and
management procedures; development of best practices and standards;
and evaluation.
Integrity of Information
Integrity requires a degree of openness and auditability
as well as accessibility of information and records for public inspection,
at least within the context of specific review processes. Integrity
in an information distribution system facilitates and insures the
ability to construct and maintain a history of intellectual dialog
and to refer to that history over long periods of time.
Lynch (1994)
Ensuring the integrity of information over time is a prominent concern
in the digital environment because physical and intellectual integrity
can easily be consciously or unconsciously compromised and variant
versions can easily be created and distributed. This concern has
two aspectschecking and certifying data integrity (associated
with technical processes such as integrity checking, certification,
digital watermarking, steganography, and user and authentication
protocols) and identifying the intellectual qualities of information
that make it authentic (associated with legal, cultural, and philosophical
concepts such as trustworthiness and completeness).
Functional requirements are particularly well articulated in highly
regulated communities such as the pharmaceutical and bioengineering
industries. Less well explored is how to identify and preserve the
intellectual integrity of information. The intellectual mechanisms
by which we come to trust traditional forms of published information
include a consideration of provenance, citation practices, peer review,
editorial practices, and an assessment of the intellectual form of
the information. In the digital environment, information may not
conform to predictable forms or may not have been through traditional
publication processes; a more complex understanding of information
characteristics and management procedures is required for the intellectual
integrity of information to be understood. Attempts are often made
to implement digital versions of procedures traditionally used in
record keeping and archival administration. Such attempts include
establishing trusted servers or repositories that can serve as a
witness or notary public; distributing information to multiple servers,
thus making it harder to damage or eliminate all copies; developing
certified digital archives as trusted third-party repositories; and
identifying canonical versions of information resources (Commission
on Preservation and Access and Research Libraries Group 1996, Lynch
1994).
Project Prism
Project Prism at Cornell University is concerned with issues of
information integrity within digital libraries. It is a four-year
collaborative project involving librarians, archivists, computer
scientists, evaluation experts, and international testbed participants.
The project was recently funded through the National Science Foundation's
Digital Library Initiative to investigate and develop policies and
mechanisms for information integrity in digital libraries. The project
will focus on five areas (Project Prism 1999):
- preservation: long-term survivability of information in
digital form;
- reliability: predictable availability of information resources
and services;
- interoperability: open standards that allow the widest
sharing of information among providers and users;
- security: attention to the privacy rights of information
users and the intellectual property rights of content creators;
and
- metadata: structured information that ensures information
integrity in digital libraries.
International Project on Permanent Records in Electronic
Systems (InterPARES)
The International Project on Permanent Records in Electronic Systems
(InterPARES) is a three-year project using archival and diplomatics
principles to examine the characteristics inherent in digital information
objects created by electronic record-keeping technologies in order
to establish their authenticity and how that authenticity might be
maintained over time. The project is funded by several agencies,
including the U.S. National Historical Records and Publications Commission
and Canada's Social Sciences and Humanities research Consortium.
An interdisciplinary team of researchers drawn from archival science,
preservation management, library and information science, computer
science, and electrical engineering is working with an industry group
(primarily the pharmaceutical and biocomputing industries) and major
archival repositories, including the national archives of several
countries.
The project builds on previous research conducted at the University
of British Columbia that examined the preservation of the integrity
of electronic records and theoretically defined the concepts of reliability
and authenticity in relation to electronic records. It also identified
the procedural requirements and responsibilities for ensuring the
reliability of active records and the authenticity of preserved records.
The philosophy underlying InterPARES is that the theories and methodologies
necessary to ensure the long-term preservation of authentic electronic
records must be centered on the nature and meaning of the records
themselves. Despite the new media and formats of electronic records,
from the perspective of archival science the integral components
that identify and authenticate a record have not changed. By combining
principles of diplomatics and archival principles, the project is
developing a template that can be used to identify requirements for
authenticity for different kinds of electronic records and systems
that generate records. To use this template and to understand the
extent to which electronic records resemble traditional records,
the project is analyzing a variety of electronic information and
record-keeping systems, including large-scale object-oriented databases,
geographic information systems, dynamic Web resources, and digital
music systems in many national legal and organizational contexts.
These analyses will be translated into recommended systems-design
requirements and authentication processes, record-keeping policies
and procedures, and preservation strategies for different types of
records (InterPARES Project 1999). Different preservation processes
will also be evaluated to ascertain their ability to maintain the
elements of different types of records identified as essential to
preserving the records' authenticity. Although this project is focused
on the authenticity requirement of records rather than on more generic
forms of information, its findings will likely be relevant to digital
information or information systems that need to retain the integrity
of physical and intellectual characteristics over time.
Metadata
I would contend that most objects of culture are . . .
embedded within context and those contexts are embedded within other
ones as well. So a characteristic of cultural objects is they're
increasingly context-dependent. And they're increasingly embedded
in meta-languages.
Brian Eno (1999)
The term metadata has different meanings depending on the community
using it. The library community frequently uses metadata to refer
to cataloging and other forms of descriptive information, but it
is also used to refer to information about the administration, preservation,
use, and technical functionality of digital information resources
(Gilliland-Swetland 1998).
With the increasing diversity of distributed and interactive digital
information systems comes a need for a metadata infrastructure that
can implement the functional requirements of each information community
and promote interoperability. The challenge is not just to identify
the areas where it is possible to map between different types of
metadata. It is also necessary to identify the tensions between the
rich and complex metadata sets that individual communities have developed
and the need for simpler metadata sets that are easier for nonspecialists
to use and systems designers to maintain. For information communities
that work with cultural information there are several important elements
in ensuring authenticity and facilitating the use of an information
object. They include metadata such as contextual description, indications
of relationships between collections of materials, annotations that
have accrued around information objects, documentation of intellectual
property rights, and documentation of processes that the information
objects have undergone, such as reformatting and migration. Rich
metadata sets that incorporate aspects such as these are essential
if the object is to be used to its fullest potential. However, considerable
demand exists for leaner metadata that will enable users to move
between information systems that might contain different types of
materials on the same subject. Some of the most interesting questions
that arise from such considerations include the following:
- How much of the metadata needs to exist in time and over time
to support the evidential qualities of the information?
- Where should the necessary metadata reside (within the digital
information system, in paper form, or both)?
- To what extent are metadata integral components of the information
object? (Where does the information object end and the metadata
begin?)
- To what extent should information professionals be engaged in
the design and creation of metadata for the systems that create
information objects to ensure that those objects can be managed
and preserved later in life?
- How can metadata help to ensure that information objects are
used optimally by diverse users?
Two examples that illustrate the contributions that archivists have
made in the area of metadata are EAD and a suite of metadata projects
that were recently conducted in Australia.
Encoded Archival Description (EAD)
Described earlier in this report, EAD is a new archival descriptive
standard adopted in the United States and being developed as a potential
international standard. A hierarchical, object-oriented way of describing
the context and content of archival collections, EAD can be a flexible
metadata infrastructure for integrating descriptions with actual
digital and digitized archival materials within an archival information
system. It can also be mapped into other metadata structures such
as MARC. Perhaps EAD's greatest potential lies in its ability to
be manipulated for information retrieval and display without compromising
how it documents the provenance, original order, and organic nature
of archival collections. As a result, it moves beyond the static
concept of the paper finding aid and can facilitate appropriate access
for diverse users at the collection and item levels (Gilliland-Swetland
2000b, Pitti 1999).
A measure of the utility and sophistication of EAD is the interest
it has created in other professional communities. The Online Archive
of California (OAC), now part of the California Digital Library,
is an example of a multi-institutional database containing encoded
finding aids and digitized content drawn from archives and special
collections of the University of California, California State University,
and numerous other universities and repositories throughout the state.
The size and scope of OAC have enabled it to develop best practices
for encoding and model evaluation processes and to examine its own
usability not only as a scholarly resource but also as a resource
for K-12 education. (Gilliland-Swetland 2000a, Online Archive of
California 1999). A constituent OAC project, Museums in the Online
Archive of California (MOAC), which is being conducted by several
museums across California, is applying EAD to the description of
museum collections. This development has the potential not only to
map between the descriptive practices of two professional communities
but to integrate access to intellectually related two- and three-dimensional
historical and cultural resources that have often been located in
different institutions.
SPIRT Recordkeeping Metadata Standards Project
Over the past five years, several metadata projects conducted in
Australia have built on the records continuum model by specifying,
standardizing, and integrating into active electronic record-keeping
systems the kinds of metadata necessary for effective record keeping
and for ensuring the long-term management and archival use of essential
evidence. These projects include the Victoria Electronic Records
Strategy metadata set and the Australian Government Locator System.
The most recent of these projects is the SPIRT (Strategic Partnership
with IndustryResearch and Training) Recordkeeping Metadata
Standards Project for Managing and Accessing Information Resources
in Networked Environments Over Time for Government, Commerce, Social
and Cultural Purposes, directed by Monash University in association
with the National Archives of Australia. This project builds on the
work of previous projects and provides a framework for standardizing
sets of interoperable record-keeping metadata that can be associated
with records from creation through processes such as embedding, encapsulation,
or linking to metadata stores. Metadata elements are classified by
purpose and are being mapped against related generic and sector-specific
metadata sets such as Dublin Core (Records Continuum Research Group
1999). In this way, archivists build a business case for including
archival considerations in the workflow because of the need to manage
risk and the role of records in supporting organizational decision
making.
Knowledge Management
Like the term metadata, the term knowledge management is being widely
used, although its meaning and how it differs from information management
are less than clear. Knowledge management refers to the practices,
skills, and technologies associated with creating, organizing, storing,
presenting, retrieving, using, preserving, disposing of, and re-using
information resources to help identify, capture, and produce knowledge.
Knowledge management is often used to create entrepreneurial opportunities
by identifying and exploiting an organization's knowledge capital.
Knowledge management activities can include data and metadata mining
as well as digital asset management. In many respects, such activities
are a logical extension of records management and archival activities
such as those under way in Australia. The rationales for building
and sustaining electronic records and other digital information resources
are derived not only from abstract concepts of information and research
needs but from administrative and legal necessity, the corporate
bottom line, and institutional or repository enterprise.
Knowledge management systems are often hybrids of born-digital,
digitized, and traditional media in the form of organizational records,
nonrecord information, and digital products (such as publications
or movies). Such systems include digital images and texts as well
as sound, moving images, graphics, and animation. They also contain
procedural and administrative information such as rights management
for digital assets. Whereas digital libraries are built around assumptions
about current and potential uses but with few hard data, digital
asset management systems are created organically out of organizational
activities and the need for agility sufficient to respond to emerging
institutional priorities. This way of looking at information resourcesregarding
their content and metadata as assets with dynamic values and market
demandis a different mindset for many information professionals.
It involves adopting a holistic rather than a piecemeal approach
to information systems and shifting from a linear to an organic perspective.
The digital asset management approach has been extensively developed
by the media industries, particularly publishing and entertainment,
where both the product and the information and records associated
with its production are primarily digital. In the entertainment industry,
studios are hiring archivists with experience in electronic records
management to build digital asset management or metadata management
systems for the assets created during production. In some cases,
a two-phase approach is adopted whereby digital production is handled
in a production management system and its contents are created, described,
and organized by the primary users. After production is completed,
all associated materials are transferred to the asset management
system, where the digital asset manager or digital archivist organizes
and describes them for secondary use. Metadata are developed to track
levels and types of use and allow maximum flexibility in retrieving
and interrelating assets.
This approach has tremendous potential for supporting the vision,
relevance, utility, and sustainability of digital library and archives
resources. It incorporates the interests of the information creator
and makes preservation management integral to creation and retention.
It offers a new economic and use-based framework to help institutions
prioritize selection of information content and decide what and how
much metadata to create; which resources to keep online; and which
assets to preserve, purge, or allow to decay gradually.
Risk Management
If archivists are to take their rightful place as regulators
of an organization's documentary requirements, they will have to
reach beyond their own professional literature and understand the
requirements for recordkeeping imposed by other professions and society
in general. Furthermore, they will have to study methods of increasing
the acceptance of their message and the impact and power of warrant.
Duff (1998)
Evaluation practices of library and information retrieval systems
have traditionally been based on four factorseffectiveness,
benefits, cost-effectiveness, and cost benefits (Lancaster 1979).
Research on electronic archival records has postulated another form
of evaluationrisk managementborrowed from professions
such as auditing, quality control, insurance, and law. Although this
concept has not been applied directly to other information environments,
it has implications for assessing risk in terms of ensuring the reliability
and authenticity, appropriate elimination, and preservation of digital
information.
Archivists seeking to develop blueprints for the management of electronic
records have undertaken several important projects in recent years.
This research showed that electronic records are likely to endure
with their evidential value intact beyond their active life only
if functional requirements for record-keeping systems design and
policies and procedures for record keeping are addressed during the
design and implementation of the system. This increases the likelihood
that appropriate software and hardware standards will be used, making
the records easier to preserve. Records will also be created in such
a way that they can be identified, audited, rendered immutable on
completion, physically or intellectually removed, and brought under
archival control.
Missing from this approach is the motivation for organizations to
invest the resources required to implement expensive archival requirements
in their active record-keeping systems. With the digital asset management
approach discussed previously, the motivation to preserve usable
digital information comes from the organization itself and is intimately
tied to enterprise management. The Australian metadata projects apply
two other strategies. The first is demonstrating that well-designed
record-keeping systems and metadata will enhance organizational decision
making. The second is risk management: persuading the organization
that the resources invested in electronic record keeping will reduce
the organizational risk incurred by not complying with archival and
record-keeping requirements. Organizations such as public bodies
and regulated industries are generally aware of the penalties for
noncompliance. Noncompliance by a public body could result in a costly
lawsuit. Noncompliance by a regulated industry could result in not
getting regulatory approval to market a new product. The cost of
noncompliance with record-keeping requirements may be significantly
higher than that of compliance. In other environments the risk analysis
may be less straightforward because the risks may be less evident
or the costs of noncompliance less tangible.
The risk management approach developed by the Recordkeeping Functional
Requirements Project at the University of Pittsburgh between 1993
and 1996 greatly influenced subsequent electronic record-keeping
research and development projects, including the Australian metadata
projects. The Pittsburgh project was an inductive project based on
case studies, expert advice, precedents, and professional standards
(Cox 1994). There were four main products of the research:
- functional requirementsa list of conditions that must be
met to ensure that evidence of business activities is produced
when needed;
- a methodology for devising a warrant for record keeping derived
from external authorities such as statutes, regulations, standards,
and professional guidelines;
- unambiguous production rules formally defining the conditions
necessary to produce evidence so that software can be developed
and the conditions tested; and
- a metadata set for uniquely identifying and explaining terms
for future access and for using and tracking records.
The contribution of the Pittsburgh project, beyond the development
of the functional requirements and metadata set was the development
of the concept of warrant and a methodology for creating a warrant
relevant to the individual circumstances of an organization. Warrant
relates to the requirements imposed on an organization by external
authorities for creating and keeping reliable records. If organizations
understand warrant regarding how they manage their electronic record-keeping
systems, they can assess the degree of risk they might incur by not
managing their systems appropriately (Duff 1998).
Knowledge Preservation
The digital world transforms traditional preservation
concepts from protecting the physical integrity of the object to
specifying the creation and maintenance of the object whose intellectual
integrity is its primary characteristic.
Conway (1996)
Preservation is arguably the single biggest challenge facing everyone
who creates, maintains, or relies on digital information. Awareness
of the immense scope of the potential preservation crisis has brought
many groups together to experiment with new preservation strategies
and technologies. Preserving knowledge is more complex than preserving
only media or content. It is about preserving the intellectual integrity
of information objects, including capturing information about the
various contexts within which information is created, organized,
and used; organic relationships with other information objects; and
characteristics that provide meaning and evidential value. Preservation
of knowledge also requires appreciating the continuing relationships
between digital and nondigital information.
The archival mission of preserving evidence over time has resulted
in demanding criteria for measuring the efficacy of the range of
strategies now being discussed for digital preservation, including
migration, emulation, bundling, and persistent object preservation.
Projects using archival testbeds are under way in several countries
with the aim of understanding the extent to which different strategies
work with a range of materials and what limitations need to be addressed
procedurally, through the development of new technological approaches,
or both.
The Cedars Project
The Cedars Project is a United Kingdom collaboration of librarians,
archivists, publishers, authors, and institutions (libraries, records
offices, and universities). Working with digitized and born-digital
materials, Cedars is using a two-track approach to evaluate different
preservation strategies through demonstration projects at U.K. test
sites; develop recommendations and guidelines; and develop practical,
robust, and scaleable models for establishing distributed digital
archives (Cedars Project 1999). Cedars is also examining other issues
related to the management of digital information, including rights
management and metadata.
The Digital Repository Project
The Digital Repository Project of the National Archives of the Netherlands
is concerned with the authenticity, accessibility, and longevity
of archival records created by Dutch government agencies. The project
brings together two important conceptsthe emulation technique
devised by Jeff Rothenberg and the reference model for an open archival
information system (OAIS) developed by the U.S. National Aeronautics
and Space Administration, which is being adopted as an ISO standard.
The emulation technique involves creating emulators for future computers
to enable them to run the software on which archived material was
created and maintained, thus recreating the functionality, look,
and feel of the material (Rothenberg 1995 and 1999). The OAIS reference
model is a high-level record-keeping model developed to assist in
the archiving of high-volume information. It delineates the processes
involved in the ingestion, storage, administrative and logistical
maintenance, intellectual metadata management, and access and delivery
of electronic records (Sawyer and Reich 1999).
The Digital Repository Project is most concerned with determining
the functionality of the repository, scope of the metadata, standards
to be applied, and differentiation of the intellectual and the physical
and technical form of the records. As with the Cedars Project, a
two-track approach is being taken. One track will build a small repository
to preserve simple records in a stand-alone environment implemented
by the National Archives. The other track will develop a testbed
and experimental framework for examining preservation strategies
such as migration, emulation, and XML on electronic records acquired
by applying the OAIS reference model (Hofman 1999).
Persistent Object Preservation
Persistent object preservation is a highly generic technological
approach that has been developed jointly by the U.S. National Archives
and Records Administration and the San Diego Supercomputer Center.
This project is addressing the need of the National Archives to find
efficient and fast methods for acquiring and preserving, in context,
millions of files that can be applied to many types of records and
that comply with archival principles. The approach focuses on storing
the information objects that make up a collection and identifying
their metadata attributes and behaviors that can be used to recreate
the collection.
Like the Digital Repository Project, persistent object preservation
is built around the OAIS reference model. It supports archival processes
from accessioning through preservation and use, and it recognizes
the importance of collection-based management. Persistent object
preservation also exploits inherent hierarchical structures within
records, predictable record forms, and dependencies between them.
It is designed to be consistent, comprehensive, and independent of
infrastructure (Rajasekar et al. 1999, Thibodeau 1999).
Next Previous
Return to CLIR Home Page >> |