The information reference subsystem, which provides access to descriptive information and thereby to the content of the Archivo’s original documents, is the heart of the entire system. It was designed to meet the objectives outlined on p. 15.
Description at the AGI
The AGI is a historical archive organized according to traditional archival principles, which differentiate the archive from the library or the documentation center. Its organization is further specified in the “principle of provenance” noted on pp. 5-6. Most of the AGI is organized according to this principle of provenance. The first division consists of sections that usually contain the papers of a single producing agency or unit thereof. The sections may be divided into subsections (especially Section V, Government, which is organized into subsections according to the territorial organization of the several Audiencias and Viceroyalties). The sections in turn are broken down into series (sometimes redivided into subseries) containing those documents produced by the corresponding agency in the exercise of each of its functions. The series are made up of documents or files, grouped into bundles.
The resulting hierarchical and multilevel model of organization is shown below:
Figure 2. Organizational Model of Holdings in the Archivo General de Indias
The 1790 Ordenanzas also mandated the formation of various finding aids that would provide physical and intellectual control of the AGI’s holdings and the necessary lines of access to the information. The ultimate objective was to compile a complete and systematic “general inventory” of all the organized holdings of the Archivo.14 This work program prefigured to some extent what we now call an “information system.”
In practice, the archival description consists of individual finding aids, such as guides, inventories, catalogs, and indices. Although the systematic “general inventory” mentioned in the AGI Ordenanzas is not complete, these independent instruments aim to project a complete and systematic vision consistent with the hierarchical structure of the holdings.
The characteristics of archival description are as follows:
- It is multilevel, including descriptions or entries of various archival units or groupings considered on their several levels: collection, series, subseries, file, and document.
- It reflects the hierarchical family-tree structure of archive organization, in which archival units or groupings are included in others at higher levels and may in turn include others at a lower level.
- For the description of each document or piece to be wholly meaningful and fully intelligible, it must relate to the higher-level units of which it is a part and, ultimately, to the agency producing the documents.
- The principle of provenance thus becomes not only the basic norm for document organization but also the essential element for attaining intellectual control of the holdings and for guiding information access.15
- This indirect access to information is supplemented by indices leading directly to the subject, person, and place. But the indices are generally auxiliary or complementary to the principal finding aids. This means that not even access through indices is totally direct; the inventory or catalog must be located first to use its auxiliary index.
Accordingly, over the two centuries of AGI history, numerous finding aids have been developed (and occasionally published): a general guide to the Archivo,16 inventories for each section,17 catalogs of certain subjects or series,18 and indices.
Many changes in personnel, goals, and criteria relating to the work have taken place since the AGI was founded. Consequently, there are great differences among the various finding aids-in format, depth of description, level of detail, and terminology.
There are manuscript instruments, such as inventories for the Sections of Patronato (relations of the Kings with the Church), Contaduría (General Accounting), Contratación (House of Trade or House of the Indies), Justicia (Justice); typewritten texts, such as the inventories for the Sections of Gobierno (Government, the papers of the Council of the Indies and the Secretaries of State), Ultramar (the nineteenth-century central agency for the colonies); printouts, such as the inventories of Consulados (Boards of Trade from Seville and Cádiz) or Correos (the Post) and some catalogs of maps and plans; loose file cards (catalogs of Registers); and bound volumes.
In some cases, individual documents are described (e.g., Patronato inventory), while in others only information at the series level is covered (e.g., Government Section inventories). The elements of description used also vary considerably-from the title, reference number, and end dates used exclusively in many inventories to the detailed description appearing in catalogs such as the Catálogo de Consultas del Consejo de Indias or the Catálogo de Pasajeros a Indias.
Some catalogs and inventories include supplemental indices, while others do not. The indices themselves are diverse and do not conform to any established norm. Some contain long headings that virtually summarize the referenced document (for example, supplemental indices for the Contaduría or Contratación sections); other indices are simpler, including only the indispensable data. Some cite the document reference number, others the corresponding page of the principal instrument, and still others (for publications) a consecutive number identifying each entry or individual description.
In short, there is a wide range of finding aids with different objectives, formats, and criteria. A study conducted before the retrospective conversion tasks were begun estimated that all of the finding aids totaled 25,000 pages. Many of them, however, were not available for use by researchers in the Reading Room.
Objectives for the Information and Reference System
- To construct a database containing valid information for locating digitized documents and to evolve a unified, global system of archival information that could handle all descriptive data of the AGI in an integrated manner. All the traditional finding aids would have to fit within the new system, so that the entire process of description and search for information could be integrated and automated. This unified system is to some extent consistent with the aim of a “general inventory” set forth in the Ordenanzas.
- To include in this unified system the information needed for better retrieval of the images of digitized documents. Such data would not be independent but an integral part of the system as a whole. Yet a considerable effort would have to be made to supplement and enhance the description of that documentation with a view to better results and more rapid retrieval.
- To undertake complete retrospective conversion of the large volume of guides, catalogs, and indices that had been developed, despite their errors and lack of standardization.
- To ensure that the new system respects traditional archival principles and the “principle of provenance” in particular. This meant organizing the descriptive information according to the organic-functional structure of the fonds, which calls for a hierarchical and multilevel model of data access.
- In addition, to allow direct access to the information through the use of keywords, as an alternative to access through the hierarchical path.
- To offer vocabulary control options for the future, although the AGI traditionally had no standardization or control for retrieval vocabulary (subject headings, thesaurus, control of authorities).
- To simplify integration with the user-management and image modules by incorporating all necessary elements-such as data access controls and references to new media-into the information and reference system.
Outline of Information and Reference System
Given these objectives, the information and reference module was designed to construct a unified data system that would make it possible to access information through the “principle of provenance path,” while also providing new possibilities for direct data access through new technologies.
Data Access by the “Principle of Provenance Path”
The system manages relational model databases to set up a hierarchical and family-tree model of all descriptive information that is fully consistent with the AGI’s organic-functional structure. Following this framework, one can navigate from the “root” or holding institution (the Archivo) through its several branches to its “leaves” (documentary units or pieces).
Holdings are organized by level, which may or may not be fully reflected in the finding aids. The information system should allow management within this structure of existing or described “real” levels. This is not a rigid information structure with preestablished levels. The system allows the inclusion and management of as many levels as may be considered necessary, depending on the requirements and holdings of the archive. The International Standard Archival Description, ISAD(G), reflects this approach, stipulating in rule 2.3 that the system of organization and access to information should “link each description to its next higher unit of description, if applicable, and identify the level of description.” But this standard, which was adopted some time after the AGI computerized system was developed, does not specify a means for its implementation.19 The AGI information and reference system employs a simple procedure: if each descriptive entry, at whatever level, has an identification code, only the code of the next higher unit in the hierarchy needs to be included in the database table to link both units.
Since the user ordinarily does not know the internal identification code, and it is also difficult to remember, the Reference Number or code of each description unit is used for the purpose. Thus, this operation in the data entry process introduces a new “element of description” that is always necessary: the Reference Number or code of the higher hierarchical unit. During consultation, the user has the option of going to a higher or lower hierarchical level. The descriptive entry corresponding to a section will indicate its component series, or a document will show all the levels above it.
Direct Information Access
The principle of provenance, which provided the main means for access to information at the AGI, is an indirect method of obtaining materials. Document access is indicated by the path (search subject/producing institution/inventories and catalogs/supplemental indices), rather than by the more direct subject/document route.
To provide points of more direct access, the AGI developed indices to supplement the principal finding aids. The indices represent document content through different “notions” or “concepts” organized in the form of index “headings.” They may be direct (index heading/document) or indirect (index heading/corresponding heading in its inventory or catalog/document).
In recent years, efforts have been made in the archival field, both nationally and internationally, to create standards for indices production. For example, the International Council on Archives (ICA), through its Ad Hoc Commission on Descriptive Standards, has drawn up an International Standard on Authorized Archival Headings for Organizations, Persons and Families ISAAR (CPF), which was adopted in Paris, November 15-20, 1995.20 Also, the Bureau of Canadian Archivists, through its Planning Committee on Descriptive Standards, published several related texts-such as the Subject Indexing for Archives, Ottawa, 1992-that contain bibliographies and references to existing standards.21
The AGI had no vocabulary control. In the planning phase, there was discussion of whether to use existing indices or to revise them. Revision would require the preparation of strict rules of control, or a thesaurus, or list of subject headings. Project staff agreed that it would be impossible to revise the indices properly within a reasonable time. Within the framework of the digitization project, it was impossible to draw up a complete list of acceptable headings valid for all sections and all levels of the AGI. The differences and incongruities among existing indices were too great. Thus, existing indices were used, with no vocabulary control in the strict sense. In the process of retrospective conversion, they were converted to the actual keywords that provide direct access to the information.
An important area of work for the future could be to establish some kind of controlled vocabulary, employing for the first time lists of all index headings used in the AGI, now available as keywords in electronic format.
Figure 3. Means of Access to Information
Keywords are the principal means for direct retrieval of information from the system. They may consist of one or more words, providing they do not exceed 120 characters. They include indices by name, toponym, institution, and subject.
These keywords can be interrelated by means of precoordinated indexing. When a unit of description is indexed, two different keywords may be linked by means of a specific relationship, such as nature, affiliation, title, or activity. For example,
|First Keyword||Relationship||Second Keyword|
|Pizzaro, Franciso||Born in||Trujillo (Cáceres)|
|Cortés, Hernán||Title||Marqués del Valle|
|Colón, Diego||Son of||Colón, Cristóbal|
|Gálvez, Barnardo de||Activity||Gobernador de Luisiana|
They may also be “qualified” by the function they perform within the corresponding document. For example, one person is the “otorgante” (giver) of the last will, the “witness” in a judgment, or the “receiver” of a letter. The name of the person is the keyword, which can be qualified with the function of “otorgante”, “witness”, “sender”, or “receiver.”
In the process of consultation, if one or more search words are introduced, the system will show all of the keywords in the database that contain those words, making it possible to locate the descriptive information about a particular document or series of documents. For each descriptive entry, all keywords considered useful may be included, with or without specific relationships or functions.
Although there is no compulsory vocabulary control, the system has the capacity to use a thesaurus as an element of support and consultation. It is also supplied with a specific profile of “responsible” vocabulary control unit, which can carry out revision functions such as the elimination and creation of keywords, elimination and creation of specific relationships, and transfer of references between keywords and relationships.
Elements of Description
Once a decision had been made on the structure for the unified system of information and the major means of access to documents, it was necessary to select the various elements of description (title, content summary, volume, date spans, etc.) to ensure adequate gathering of all information concerning AGI documents. This was some years before the EAD (Encoded Archival Description) was developed. Certain existing possibilities were studied, including the MARC AMC format, but it was decided to choose something simpler, adapted to the AGI’s specific needs. At the time there was no internationally recognized standard for archival description, although Canada, Britain, and the United States had developed some national standards.22 In 1992, International Standard Archival Description, ISAD(G), was proposed to the professional community at the International Congress on Archives, held in Montreal. It was subsequently adopted by the Ad Hoc Commission at its Stockholm meeting, January 21-23, 1993.23
Consequently, a data structure was developed that consists of 30 elements of description divided into three areas (basic information, descriptive information, and elements of retrieval). The main data are summarized below. A numeric label identifies each of the several elements.
Basic Information Area
All of the elements contained in this area are required, although during data entry most can be included as values by default. These elements include the reference number, dates, title, and reference number of the preceding element in the hierarchy. Almost all of this corresponds to the Identity Statement Area of the ISAD(G) standard, although additional elements needed for service are also included.
Descriptive Information Area
This covers a series of elements of description that are optional, although some are used in almost every case, while others are used for specific documentary types. These descriptions are usually in a free-text format. Examples of the elements are the content summary, old and new reference numbers, site of issue, and internal and external characteristics.
Elements of Location, Retrieval, or Access
These are the elements facilitating access to the data, including the keywords and their several supplements, such as specific relationships and functions. Also included are such elements of retrieval as the old card numbers or those in other media, additional dates to focus searches, and so forth.
Initial Content: Retrospective Conversion of Finding Aids
At the beginning of the project, two distinct possibilities were studied regarding the initial content of the information and reference system :
- To incorporate only the descriptive information necessary for retrieval of those documents included in the digital image storage system; or
- To incorporate all information contained in the AGI’s finding aids; that is, inventories, catalogs, and indices, whether printed, typed, or in manuscript.
In the first case, a database structure could have been designed that was appropriate for the documents to be digitized. In addition, efforts could have been made to produce a more precise description of only a subset of AGI documents, determining the fields necessary for each document type.
The second option, which was the one ultimately chosen, was a new venture entailing considerable risk: there was no previous experience to serve as reference, since no archive had yet undertaken a similar operation. The problems-the quality of existing descriptive information, different levels of description, lack of uniform criteria and standards over time, and different formats-posed several difficulties.
A complete list was made of all the finding aids to be converted (estimated then at 25,000 pages). The content and format of each was analyzed, and priorities were established for the subsequent work, to be undertaken in phases:
- Finding aids essential in forming the complete hierarchical structure of the database were assigned priority.
- Within that group, special emphasis was placed on finding aids favoring preparation of the material to be digitized (for example, that included in the Patronato Section, all of which would be computerized).
- Priority was then attached to all catalogs and indices describing certain resources or series in detail and in depth.
IBM’s Personal Editor was adapted for the work to provide plain data entry screens consistent with the needs of each of the finding aids to be recorded. The data were stored in simple flat ASCII files, with identification codes or labels for each of the fields employed and with control words adding information needed at the time of data entry. Two work groups were organized:
- Finding aids published or properly prepared and structured in typescript were keyed in by personnel with no specialized archival background but with data entry experience.
- The finding aids most difficult to interpret (usually manuscript) and requiring more careful review were recorded directly in the AGI by a group with more extensive archival and historical training, overseen directly by the regular staff.
The results of the process can be outlined in summary as follows:
- The idea of the unified information system took shape as a real operating system following an extended, ongoing effort.
- Continuity of the project with the same guidelines over a long period allowed successful completion of the original aims.
- The goal was extremely ambitious, but it was always focused on doing whatever could be done at the time.
- The decision to go public with a system filled with as much content as possible, as opposed to a database structure without content, led to acceptance by researchers.
- Based on this introductory information, the initial errors and inconsistencies could provide incentive to continue the effort, with the goal of gradually supplementing and improving all of the AGI’s descriptive information.
The Information and Reference System Today
This retrospective conversion operation has been complemented by a special effort to improve and expand descriptions, especially for the digitized documents. The AGI’s information and reference system currently includes the following breakdown of entries for each of its “units of description”:
- 37 entries at the section level
- 440 at the series level
- 187 at the subseries level
- 45,398 at the bundle level
- 113,936 at the file level
- 193,849 at the document level
- 61,519 entries for passenger lists to the Americas
These hierarchical entries have been supplemented by about 400,000 keywords to facilitate direct access to specific documents.
Today, we can make some observations on the information and reference system with a view more to the future than the past, drawing on five years of service to users and recent developments in both technology and the development of standards.
Direct and Indirect Access to Archival Information
For many years, the AGI Reading Room has been an excellent field of observation for analyzing the means of intellectual access to archival information because researchers could use both the traditional means of access, by way of the principle of provenance, and the direct means, through indexes.
Although it is risky to reach conclusions because there have been no statistical studies in this area, certain statements can be made based on daily observation. For example, it can be said that a researcher, especially one with little experience in using archives, makes primary use of the direct access system (keywords), usually ignoring the indirect (principle of provenance, hierarchical path), which is always slower and more tedious. Such researchers are often impressed by the easy access to initial data, which makes them believe they have retrieved all available information, even though only a small percentage of the total AGI has been concisely indexed.
Richard H. Lytle undertook an experiment in this area in 1978 and published the results in 1980.24 One of his conclusions-that the indexing method offered a greater variation in results (better and worse)-is apparent in this connection. If indexing is complete, it is undoubtedly a rapid, useful method for the user. If not, search results will be poor. Clearly, this may have “silenced” a large volume of documents that have barely been covered by a general description and are not indexed.
But in addition, part of the document context information may be lost through exclusive use of indexes. It is well known that an archival document is not an individual, independent piece but acquires its meaning from the environment in which it has been produced. Thus, the traditional method of searching continues to be not only valid but essential. The complementary use of both methods is, today, still the best means of access to information. The staff responsible for information must therefore guide the less expert researcher. Nevertheless, one veteran researcher said that he had located in a few days more information than he had found in weeks of work before the new system was installed.
Information Retrieval Tools
One achievement of the project has been to develop a unified system of descriptive information for the AGI. But the tools used to build the system now urgently require updating. Use of a system of relational database management with SQL was a significant advance when the relational model was still used basically for totally structured information and management situations. But today, the system suffers from excessive rigidity, and not all of the advantages offered by unstructured information search systems-such as free text, full text, and truncations-have been used. In this regard, the information system has lagged far behind and urgently needs updating.
Standardization of all aspects of the information and reference system is another crucial issue for the future.
- Standardized description is the first aspect to be considered. In the near future, it therefore seems advisable to incorporate at least the ISAD(G) standards, although they are still a general and imperfect effort. They are also far removed from other developments in the field of documentation, such as the ISBD or AACR2. It will be necessary to monitor such recent developments as EAD, implemented in the United States through the Berkeley Finding Aid Project and supported in the SGML standard.
- It is also necessary to standardize the language of indexing. This will be a long-term project for the AGI, since no list of acceptable subject headings or indices has been developed. The advantage is that the AGI can do this a posteriori, using not only international standards in the field but all keywords generated over the course of two centuries, which are currently available in a unified, automated listing.
- Standardization of other aspects, such as the system for querying databases (SQL, ODBC) or Internet access, must also be addressed.
Revision and Updating of Information System Content
Archival work consists, ultimately, of furthering the organization, description, and understanding of documents. Similarly, the AGI’s information system will require continuous revision, updating, and improvement regarding both description and access. In the process, it will be possible gradually to eliminate errors originating in the retrospective conversion.
15 See Eric Ketelaar, “Exploitation of New Archival Materials,” in Proceedings of the 11th International Congress on Archives (Munich, New York, London, Paris: K.G. Sauer, 1989), 189-99; Richard H. Lytle, “Intellectual Access to Archives” American Archivist 43 (1980): 64-76 and 191-208; David A. Bearman and Richard H. Lytle; “The Power of the Principle of Provenance,” Archivaria 21 (1985-86): 14-27; and Michel Duchein, “Theoretical Principles and Practical Problems of Respect des Fonds,” Archivaria 16 (Summer 1983): 64-82.
16 The most significant Archivo guides are those by José Torre Revello, El Archivo General de Indias de Sevilla. Historia y Clasificación de sus Fondos [The Archivo General de Indias of Seville: History and Classification of its Holdings] (Buenos Aires: Instituto de Investigaciones Históricas, 1929), and José María de la Peña y Cámara, Archivo General de Indias. Guía del Visitante [Visitor’s Guide] (Valencia: Dirección General de Archives y Bibliotecas, 1958).
18 For example, the Catálogos de Pasajeros a Indias [Lists of Passengers to the Indies] (7 vols. published), or the many catalogs of maps and plans by Pedro Torres Lanzas, Julio González, or María Antonia Colomar.
20 ISAAR (CPF): International Standard Archival Authority Record for Corporate Bodies, Persons, and Families. Final ICA approved version. Prepared by the Ad Hoc Commission on Descriptive Standards, Paris, November 15-20, 1995.
21 Bureau of Canadian Archivists. Planning Committee on Descriptive Standards. Subject Indexing for Archives: The Report of the Subject Indexing Working Group. (Ottawa: Bureau of Canadian Archivists, 1992).
22 Steven Hensen, Archives, Personal Papers and Manuscripts: A Cataloging Manual for Archival Repositories, Historical Societies and Manuscripts Libraries (2nd ed.) (Chicago: Society of American Archivists, 1989); Michael Cook and Margaret Procter, Manual of Archival Description (MAD) (London: Society of Archivists, 1989); Bureau of Canadian Archivists, Rules of Archival Description (Ottawa: Bureau of Canadian Archivists, 1990).
23 International Council on Archives, General International Standard Archival Description, ISAD(G), adopted by the Ad Hoc Commission on Descriptive Standards at its Stockholm meeting, January 21-23, 1993.