The quest for knowledge rather than mere information is the crux of the study of archives and of the daily work of the archivist. All the key words applied to archival records-provenance, respect des fonds, context, evolution, inter-relationships, order-imply a sense of understanding, of “knowledge,” rather than the merely efficient retrieval of names, dates, subjects, or whatever, all devoid of context, that is “information” (undeniably useful as this might be for many purposes). Quite simply, archivists must transcend mere information, and mere information management, if they wish to search for, and lead others to seek, “knowledge” and meaning among the records in their care.
Archival theory, methodology, and practice together constitute archival science. Because archival science is scholarly as well as practical and uses a distinct methodology to gain knowledge, it can be considered both a discipline and a profession (Livelton 1996). The disciplinary and professional aspects of archival science together compose the archival paradigm-a set of assumptions, principles, and practices that are common to the archival community and are a model for its activities and outlook.
Although archives have existed for thousands of years, much of the archival paradigm-not unlike that of library science-coalesced between the mid-nineteenth and twentieth centuries. Several key treatises and manuals codifying archival theory and practice were published between 1830 (when François Guizot, French Minister of Public Instruction, issued regulations requiring the application of respect pour les fonds to the records of the départements in the Archives Nationales) and 1956 (when T. R. Schellenberg, an archivist at the U.S. National Archives and Records Administration, published Modern Archives: Principles and Techniques, containing an American delineation of the archival paradigm). The most influential of these was the Manual on the Arrangement and Description of Archives, written in 1898 by Dutch archivists Muller, Feith, and Fruin, which brought together the French and Prussian ideas of respect des fonds and provenance. The translated manual was widely disseminated and was a major topic of discussion when librarians and archivists met for the first time for an international congress at the 1910 World’s Fair in Brussels. As a result, the concept of provenance was adopted by the congress as the basic rule of the archival profession (Van den Broek 1997).
The archival paradigm has been extensively influenced by the so-called auxiliary and ancillary disciplines-diplomatics, history, law, textual criticism, management and organizational theory, and library science. Perhaps most influential have been the research methods of modern scientific history and legal theories of evidence that developed during the nineteenth century largely from diplomatics. Diplomatics was developed to help establish the authenticity of medieval ecclesiastical records. It is the study of the genesis, forms, and transmission of archival documents; their relation to the facts represented in them; and their relation to their creator, in order to identify, evaluate, and communicate their true nature (Duranti 1998a). As a result of these influences, most of the archival community working with public records focused on developing principles for archival arrangement and description that emphasized the organic nature of records and the circumstances of their creation. The manuscript community and some national archives, however, adopted bibliographic practices of subject control (Duranti 1998b). In the United States, where the archival profession was only just beginning to coalesce, historian and later archivist Waldo Gifford Leland presented a paper at the First Conference of Archivists in 1909 calling for the reorganization of archives according to the principle of provenance rather than library methods. In a report on the Illinois State Archives, Leland wrote that an administrative history must be prepared for each office and that the archives should be classified to reflect the organization and functions that produced them (Brichford 1982).
The bifurcation of public archives and historical manuscript descriptive practices in the United States can most easily be explained in terms of prospective use and archival setting. For archivists administering records programs within their own institutions, the primary uses of records were legal proof and administrative research, often conducted by the records creators. For those engaged in manuscript administration, the focus was on secondary use by historical scholars, often in a research library, where there was more pressure to apply bibliographic models of description (Gilliland-Swetland 1991). Arguably, therefore, library science has influenced archival science less through the contribution of specific practices than through the encouragement of greater emphasis on access and user orientation.
Archivists and the bibliographic community worked together to increase use and facilitate access to archival and manuscript holdings. In 1983, they developed the machine-readable cataloging (MARC) archival and manuscripts control (AMC) format to describe their holdings. Their goal was to integrate standardized information about archival holdings into bibliographic utilities and online public access catalogs and encourage wider use of the holdings. Although MARC AMC was widely adopted by university archivists as well as many state and local historical repositories, many archivists were not comfortable with what they perceived to be the forcing of archival descriptive practices into a data structure that was still essentially bibliographic. In 1993, work began on encoded archival description (EAD), which took the core archival descriptive tool-the finding aid-and used it to develop a standard generalized mark-up language (SGML) document type definition. This definition could be used to disseminate archival descriptive information on the World Wide Web and could be mapped onto other kinds of descriptive metadata in digital information resources.
In the United States, where archival practice developed later than in Europe, a whole new focus on the management of current records emerged between the 1930s and 1960s. Faced with vast quantities of modern records generated by two world wars and a huge federal bureaucracy and with early adoption of new record-keeping and reproduction technologies, archivists at the National Archives realized that they could not possibly keep everything. Thus, they developed revolutionary approaches that engaged archivists at the point of record creation in identifying active records of long-term value and arranging for the orderly retiring of inactive records. This development had two important consequences: the addition to the archival paradigm of a new set of theories relating to life cycle management of records and appraisal and the establishment of the records management profession with the founding in 1956 of the American Records Management Association (now the Association of Records Managers and Administrators International).
From the 1970s until the early 1990s, the archival community in the United States hotly debated the extent to which archival principles and practices were based in theory versus expediency (Burke 1981, Roberts 1987 and 1990, Stielow 1991). In 1981, F. Gerald Ham said that technology and a changing social role for archives would lead to more active management of archival records and a reexamination of many basic assumptions about archival theory and practice. The debate gave way to the reexamination, as Ham predicted. Archivists needed to cope with emerging electronic record-keeping technologies, new modes of scholarly research (in particular the rise of social history and postmodern approaches to research), and increasing user expectations that archivists should provide automated information access.
The debate first centered on appraisal, the process by which archivists identify materials of long-term value. Issues discussed were what and how much to keep and how, in new electronic formats, to identify records in the often undifferentiated mass of digital information. Extensive discussion ensued about the need for descriptive standards developed from the archival perspective and how to reconcile the different descriptive traditions of the various information professions as well as within the archival community (Duff and Haworth 1993).
This debate has led to a reformulation and extension of core archival principles and practices. The archival community has argued that archival needs exist in wider information systems design and in the processes of document creation and preservation. It has also considered what its approaches have to offer in the wider realm of information management (Taylor 1993b). This is evidenced in a host of recent developments, discussed later in this report, such as EAD, the SPIRT Record-keeping Metadata Research Project in Australia, the Functional Requirements Project at the University of Pittsburgh, the International Project on Permanent Authentic Records in Electronic Systems (InterPARES) Project, and the Consortium of University Research Libraries (CURL) Exemplars in Digital Archives (Cedars) Project in the United Kingdom.
The essential principles supporting the archival perspective are as follows:
- the sanctity of evidence;
- respect des fonds, provenance, and original order;
- the life cycle of records;
- the organic nature of records; and
- hierarchy in records and their descriptions.
How these principles have evolved with regard to knowledge management in the digital information environment is discussed below. These principles reflect the concerns of a profession that is interested in information as evidence and in the ways in which the context, form, and interrelationships among materials help users to identify, trust, interpret, and make relevant decisions about those materials.
The Sanctity of Evidence
History in the true sense depends on the unvarnished evidence, considering not only what happened, but why it happened, what succeeded, what went wrong.
Many of the information professions interact closely with other disciplines and derive much of their outlook from those relationships. For example, the practices and perspectives of information scientists have been strongly influenced by science and computer science. Archivists are closely aligned with professions such as law, history, journalism, anthropology, and archaeology. Evidence in the archival sense can be defined as the passive ability of documents and objects and their associated contexts to provide insight into the processes, activities, and events that led to their creation for legal, historical, archaeological, and other purposes. The concern for evidence permeates all archival activities and demands complex approaches to the management of information; it also sets high benchmarks for information systems and services, particularly with respect to archival description and preservation. Recently, the paramount importance of identifying and maintaining the evidential value of archival materials has been reemphasized, partly as a result of the challenges posed by electronic records but partly also to differentiate the information and preservation practices of the archival community from those of the library community.
The integrity of the evidential value of materials is ensured by demonstrating an unbroken chain of custody, precisely documenting the aggregation of archival materials as received from their creator and integrated with the rest of the archives’ holdings of the same provenance, and tracking all preservation activities associated with the materials. Jenkinson (1937) described this process as the physical and moral defense of the record. Schellenberg (1956) expanded archival notions about evidence when he discussed the values that archivists should use to help them decide which materials to retain. The primary values of archival records are related to the legal, fiscal, and administrative purposes of the records creators; the secondary values are related to subsequent researchers. Schellenberg (1956) argued that the secondary values of public records can be ascertained most easily if they are considered in relation to “(1) the evidence they contain of the organization and functioning of the Government body that produced them, and (2) the information they contain on persons, corporate bodies, things [e.g., places, buildings, physical objects], problems, conditions, and the like, with which the Government dealt.” His argument acknowledges both the strict legal requirements of records that must be satisfied by archival processes and the wider concept of historical and cultural evidence that is contained in the materials and can be interpreted by secondary users.
The archival concern for the description and preservation of evidence involves a rich understanding of the implicit and explicit values of materials at creation and over time. It also involves an acute awareness of how such values can be diminished or lost when the integrity of materials is compromised. Evidential value in the widest sense is reflected to some extent in any information artifact, but only a subset of all information is subject to legal or regulatory requirements concerning creation and maintenance. Publications, for example, can be analyzed for evidence of the motivations and processes associated with their creation by studying their physical and intellectual form, examining different editions of the same work, and learning about the history of the publishing house or printer that produced them. Primary sources (unpublished or unsynthesized materials) particularly lend themselves to such kinds of analysis and interpretation, and such materials are increasingly being incorporated into digital information resources.
Maintaining the evidential value of information is important not only to creators of materials that are subject to legal or regulatory requirements but also to many researchers. In particular, reformatting, description, and preservation need to be considered. Reformatting has been discussed extensively in the professional literature in relation to the digitization of library and archival collections. Information professionals involved in digitally reformatting their collections must understand when a user may need to work with the original information object to appreciate some intrinsic characteristic, such as the weight of the paper; when a digital copy will do; and whether a copy needs to be high or low resolution, color or black and white. Information professionals must also decide how much of a collection needs to be digitized and what kind of metadata will enable a user to place information objects in context.
Archival practice places a premium on both collective and contextual description. The key is to explain the physical aspects and intellectual structure of the collection that may not be apparent and to provide enough contextual information for the user to understand the historical circumstances and organizational processes of the object’s creation. Description should also demonstrate that the physical and the intellectual form of the materials have not been altered in any undocumented way.
Counterintuitively, perhaps, it is during the preservation of digital materials that evidential value is often most at risk of being compromised. Digital preservation techniques have moved beyond a concern for the longevity of digital media to a concern for the preservation of the information stored in those media during recurrent migration to new software and hardware. In the process, many of the intrinsic characteristics of information objects can disappear-data structures can be modified and presentation of the object on a computer screen can be altered.
Respect des Fonds, Provenance, and Original Order
The perfect Archive is ex hypothesi an evidence which cannot lie to us: we may through laziness or other imperfection of our own misinterpret its statements or implications, but itself it makes no attempt to convince us of fact or error, to persuade or dissuade: it just tells us. That is, it does so always provided that it has come to us in exactly the state in which its original creators left it. Here then, is the supreme and most difficult task of the Archivist-to hand on the documents as nearly as possible in the state in which he received them, without adding or taking away, physically or morally, anything: to preserve unviolated, without the possibility of a suspicion of violation, every element in them, every quality they possessed when they came to him, while at the same time permitting and facilitating handling and use.
This cluster of principles represents the core tenets of archival theory and practice. Although the tenets are interpreted differently by different archival traditions, they nevertheless represent the essence of the archival perspective and its blend of intellectual and pragmatic rationales.
The principle ofrespect des fonds was first codified in 1839 in regulations issued by the French minister of public instruction. The principle stated that records should be grouped according to the nature of the institution that accumulated them. In 1881, the Prussian State Archives issued more precise regulations on arrangement that defined Provenienzprinzip, or the principle of provenance. The principle of provenance has two components: records of the same provenance should not be mixed with those of a different provenance, and the archivist should maintain the original order in which the records were created and kept. The latter is referred to as the principle of original order in English and Registraturprinzip in German. The French conception of respect des fonds did not include the same stricture to maintain original order (referred to in French as respect de l’ordre intérieure), largely because French archivists had been applying what was known as the principle of pertinence and rearranging records according to their subject content.
The benefits of respect des fonds are self-evident. Originally conceived of in physical terms, this principle facilitates physical and intellectual access to records generated and received by the same institution or person by gathering and describing them as an intellectual whole, regardless of their form, medium, or volume (Duchein 1983). The principle of provenance enhanced this approach by ensuring that the records remained as much as possible as they were originally created. From a practical viewpoint, the principle of original order obviated the need for resource-intensive and contentious rearrangement according to subject. From an intellectual viewpoint, it preserved the objectivity of the records and provided insight into the functions, processes, and personal relationships of the records creator as reflected in the arrangement of the records (Gränström1994, Schellenberg 1961).
In recent years, the conceptualization of these basic tenets has become more complex as bureaucratic structures have evolved and digital systems have been increasingly used for record keeping. Archivists have had difficulty establishing the provenance of records of multi-institutional collaborations or those contained in multifunctional databases and distributed information systems. In archival appraisal, more sophisticated conceptions of provenance, such as functional provenance and multiprovenance, have been developed for electronic records that apply business process analysis and functional decomposition. Functional provenance views the business function through which a record came into being as that record’s provenance rather than the office or individual creating the record. This view is based on the rationale that record-keeping functions are likely to remain more or less constant whereas bureaucratic hierarchies and technologies shift over time. Multiprovenance recognizes that a record may be simultaneously created through the interaction of multiple offices or jurisdictions. In archival description, developments such as EAD and the Australian series system recognize that a one-to-many relationship may exist for groups of records created by changing bureaucratic structures. In the words of Australian archivists Frank Upward and Sue McKemmish (1994):
The new [post-custodial] discourse has a new language, and is grounded in a new provenance theory. Structure no longer means only organisational structure; it can now mean the structures in which transactions are captured as records, including documentary forms and record-keeping systems. Context no longer means only record creators; it can now mean the agents of transactions operating in the context of their functions and activities. Functions and activities are no longer defined simply in terms of organisational charts; jurisdictions, competencies, and operational realities must be considered.
Taken together, respect des fonds, provenance, and original order ensure that the intellectual integrity of aggregations of records is maintained and that individual records are always contextualized. Adhering to these principles is a less resource-intensive way of providing access to high-volume collections than are classifying by subject and cataloging of individual documents. Considerable cataloging expertise and the availability of specialized standardized vocabularies are required for correct and consistent assignment of subject access points to heterogeneous unsynthesized and unpublished materials (Michelson 1987). Because the language used in archival materials is often archaic or technical, assigning a modern subject term that accurately reflects the concepts being expressed in the records can be difficult. On the basis of their insight about how users working with historical and organizational materials might wish to search, archivists have broadened the notion of subject access, suggesting access points such as temporal and geographic coverage and form of material (Bearman and Lytle 1985, Bearman and Sigmond 1987, Roe 1990). Today we can see the application of such approaches in the resource type and coverage elements that have been integrated into the Dublin Core for use in resource discovery of networked electronic resources (Dublin Core Metadata Initiative 1999).
A huge volume of digital information has not gone through editorial and publication processes. Subject access and item control practices are not sufficient for effective and efficient organization of such information. The archival approach offers the concepts of collective arrangement and description according to the provenance of the materials; these provide benefits even when information managers or users are not interested in the evidential value of the materials. Applying these concepts makes it possible to unite related digital, nondigital, and predigital materials according to their intellectual rather than their physical characteristics. These concepts build context, which is a powerful and underused tool for facilitating understanding and ultimately creating knowledge. They prompt the user to consider the degree to which the material’s source is authoritative. The archival approach focuses on the context, organic development, and content of the collection, allowing the user to ask the “how,” “why,” and “so what” questions so integral to research.
The Life Cycle of Records
If we can become overarching information generalists with an archival emphasis, we will be able to bring to bear what should be a deep and thorough knowledge of the documentary life-cycle theory . . . it may be our most important asset in relation to (I do not say in competition with) our colleagues, the librarians and other information specialists.
The U.S. National Archives and Records Administration developed the concept of the records life cycle to model how the functions of, use of, and responsibility for records change as records age and move from the control of their creator to the physical custody of the archives. In the first phase of this model, administrators create and use records (in archival terms, primary use). Records creators must develop logical systems for classifying or registering records and implement procedures to ensure the integrity of the records. Records managers and archivists also ensure that active records are scheduled for systematic elimination or permanent retention. As records age, they gradually become less heavily referenced and finally become inactive. During the second phase, the archives is a neutral third party responsible for ensuring the long-term integrity of the records. When the records enter the archives, they are physically and intellectually integrated with other archival materials of the same provenance, thus establishing the archival bond (Duranti 1996). Their physical integrity is ensured through preservation management; their intellectual integrity, through archival description. Archival records are then available for secondary use.
Changes in methods of record creation and in perceptions of their continuing value have recently led archivists to consider how to apply the life cycle model in a digital environment. The principles underlying the life cycle have been refined through projects such as Preservation of the Integrity of Electronic Records, conducted from 1994 to 1996 by archival researchers at the University of British Columbia (known as the UBC Project). An alternate model-the records continuum-has been proposed. This model now undergirds the conceptualization of the role and activities of the record-keeping professions in Australia and is gaining in acceptance in the United States and Europe.
The UBC Project sought to develop a generic model to identify and define by-products of electronic information systems and methods for protecting the integrity of the by-products, which constitute evidence of action (Duranti and MacNeil 1997). Using a deductive method drawing on the principles of diplomatics and archival science, the project identified the procedures necessary to ensure control over reliable records creation during the first phase of the records life cycle and to maintain the integrity of archival records during the second phase. The project reiterates the need in the digital environment for completed records placed under the jurisdiction of the archives.
The records continuum model takes a different approach. Records managers and archivists are involved with records beginning when a record-keeping system is designed. Physical transfer to the archives is not required; archivists establish requirements for appropriate maintenance of the records and monitor compliance by records creators. The intellectual interrelationships of active and archival records are established by integrating metadata from active records into the archival authority’s information system (Upward and McKemmish 1994). This postcustodial model expands the role of the archivist to include active participation in the production and use of records.
The benefits of modeling the life cycle of information materials extend to information management in general by
- providing for the management of information resources from birth to death and identifying the points at which responsibilities for managing those resources change or certain actions must occur;
- integrating the communities responsible for creating, disposing of, and preserving information resources with those focusing on the organization and use of information;
- recognizing the motivations of different parties to ensure the integrity of information materials and points in the life cycle at which those motivations become less compelling, thus putting the materials at risk;
- clearly elucidating the process of creating and consuming knowledge and using it to create new knowledge;
- making it possible to meet different user needs; and
- enabling prediction of levels of use and management of information storage requirements.
An example of the application of life cycle model in a nonarchival digital information framework is the Information Life Cycle model, developed at the 1996 National Science Foundation Workshop on the Social Aspects of Digital Libraries at the University of California, Los Angeles. This model (see figure 2) represents the flow of information in a given social system. It emphasizes the technologically based information storage and retrieval aspects of a digital library as well as the belief that digital libraries should be constructed to accommodate the actual tasks and activities involved in creating, seeking, and using information resources (Borgman et al. 1996).
The Organic Nature of Records
Records that are the product of organic activity have a value that derives from the way they were produced. Since they were created in consequence of the actions to which they relate, they often contain an unconscious and therefore impartial record of the action. Thus the evidence they contain of the actions they record has a peculiar value. It is the quality of this evidence that is our concern here. Records, however, also have a value for the evidence they contain of the actions that resulted in their production. It is the content of the evidence that is our concern here.
The practices of many information communities focus on the best and most cost-effective ways to organize and retrieve discrete information objects. Archival practice assumes that materials within a fond can be most effectively organized and retrieved collectively. Although collective management and description are pragmatic ways to gain basic levels of control over large quantities of heterogeneous information, for archivists the rationale behind these practices lies in the inherent characteristics of records and other materials that are the by-product of human activities. When materials are generated by the activity of an individual or organization, an interdependent relationship exists between the materials and their creator. A complex web of relationships also exists between the materials and the historical, legal, and procedural contexts of their development as well as among all materials created by the same activity. The organic nature of records refers to all these interrelationships, and archival practices are designed to collectively document, capture, and exploit them. These practices recognize that the value of an individual record is derived in part from the sequence of records within which it is located. They also recognize that it can be difficult to understand an individual record without understanding its historical, legal, procedural, and documentary context.
The perspective gained from working with information collectively can also be applied to the description, preservation, and use of Web resources. Resources created on the Web are not unlike archival fonds in that they include a complex of hyperlinks to pages related by provenance, topic, or some other feature. An advantage in the Web environment is that hyperlinks are explicit rather than largely implicit, as is the case with paper records. As a result, those who manage and use these resources can more easily identify and exploit organic relationships. A Web page without its hyperlinks may be less valuable to users because of its diminished evidential content.
Fig. 2. Model of the life cycle of information in digital libraries (UCLA-NSF Workshop 1995)
Hierarchy in Records and Their Descriptions
Recent developments in information organization have exploited the structure of information content and its metadata to provide smarter access to materials, especially those that are hard to locate by subject or keyword. This is particularly evident in efforts to apply extensible mark-up language (XML) to develop structures that are more predictable for Web resources and in the application of the text encoding initiative for the SGML encoding of literary and historical texts.
Structure can be both intellectual and physical; it can exist within an information object, collections of information objects, and descriptions of those information objects. Archival practices explicitly recognize the existence of such structures and exploit those that are hierarchical. Developing and using hierarchies are intuitive ways for humans to model information; as a result, much information and many information systems have hierarchical characteristics.
To ascertain authenticity, archivists use principles derived from diplomatics to analyze how the intellectual form of records reflects the functions by which they were created. Diplomatics maintains that the intellectual form of records usually has three components-protocol, text, and eschatocol. Each of these components contains groups of additional elements of form; for example, the protocol contains elements such as the name of the author, the date the record was created, the name of the person to whom the record is directed, and the subject of the record. The eschatocol contains elements that validate the document, such as the official title of the author and signatures of witnesses and countersigners. When elements are absent or irregular, the records’ authenticity may be questioned (Duranti 1998a).
Records have an innate hierarchy imposed by the creating agency’s filing practices and position in a bureaucratic hierarchy and by the processes through which the records were created. A fond may contain sous-fonds or a record group may contain subgroups, which may in turn contain many series of records, each relating to a different activity. Individual record series may be divided into subseries and even subsubseries, which may be further divided into filing units that contain individual documents.
Archival description, through inventories and registers collectively referred to as finding aids, has traditionally reflected these hierarchies. A high-level summary description provides basic intellectual control and collection management information for a set of records; progressively more granular descriptions are prepared for subordinate levels in the hierarchy. There are four advantages to this approach:
- It documents all the records of the same provenance, their arrangement, and the chain of custody that brought them into archival control.
- It permits economies in description. Collective description is less expensive than item-level description; this approach enables archivists to decide how far down in the hierarchy detailed description is needed on the basis of the values exhibited by the materials and the anticipated level and nature of use.
- For many kinds of historical and bureaucratic uses, this description mirrors the arrangement of the records and provides a logical way to search for materials.
- This approach can be applied regardless of the nature of a collection and does not require specialized description for special forms of materials.
In the digital environment, hierarchical and collective description lend themselves to hierarchical and object-oriented metadata structures such as SGML. The development since 1995 of the SGML document type definition for EAD has turned descriptive practices that may have seemed cumbersome into a powerful infrastructure for online information systems. A data structure standard for preparing encoded digital finding aids, EAD permits a collection to be searched at different levels of description and links to be built to descriptions of organically related materials or digitized versions of the materials. Figure 3 indicates the high-level model of the EAD document type definition and shows how the encoded finding aid has been broken into three major intellectual components:
- eadheader,which provides bibliographic and descriptive information about the encoded finding aid;
- frontmatter,which contains prefatory information about the creation, publication, or use of the finding aid; and
- archdesc, which describes the content, context, and extent of the archival materials being described.
Each component contains a hierarchy of nested elements, the most complex of which is archdesc. As indicated in the high-level model, archdesc contains many elements, each of which is also available for use at lower levels in the hierarchy. The LEVEL attribute indicates the level at which the element is occurring within the descriptive hierarchy. The tag for description of subordinate components (<dsc>) indicates how components at each level are further subdivided. Up to 12 numbered or unnumbered components can be nested within each <dsc> (Society of American Archivists Encoded Archival Description Working Group 1998 and 1999).
Fig. 3. High-level model for the encoded archival description document type definition (Society of American Archivists Encoded Archival Description Working Group 1999)