Whether called “the elephant in the closet” (Mandel 2004, 106) or a “dirty little secret” (Tabb 2004, 123), hidden collections are becoming recognized as a major problem for archives and special collections. As the Council on Library and Information Resources (CLIR) stated in launching its Cataloging Hidden Special Collections and Archives Program, “Libraries, archives, and cultural institutions hold millions of items that have never been adequately described. These items are all but unknown to, and unused by, the scholars those organizations aim to serve” (2008). Reducing archival backlogs and exposing once-hidden collections will likely require that archives revamp their workflows, but software can play a role in making archives more efficient and their collections more visible.
What technologies can help archives and special collections tackle their “hidden collections” and make them available to researchers? This report explores archival management systems such as Archon, Archivists’ Toolkit (AT), Cuadra STAR, and Minisis M2A. It also considers tools for creating and publishing encoded archival description (EAD) finding aids. Archival management systems are a kind of software that typically provide integrated support for the archival workflow, including appraisal, accessioning, description, arrangement, publication of finding aids, collection management, and preservation. (Tools, on the other hand, are software applications that typically focus on specific tasks and can be components of systems.) Rather than explicitly recommending particular software, this report takes archivists through the main decision points, including types of licenses, cost, support for collection management, and flexibility versus standardization. The report draws upon interviews with users as well as on previous studies of archival software and information provided by the developers and vendors. It offers features matrices for selected archival management systems so that archivists can make quick comparisons of different software. Instead of evaluating the performance of the software, this report compares features and reports on the experiences of archivists in implementing them. This report is intended to be a resource for the archival community to build upon; hence it is available as a wiki at http://archivalsoftware.pbwiki.com/, and archivists, information technology (IT) staff, and developers are invited to add new information to it.
2. The Problem of Hidden Collections
According to a 1998 Association of Research Libraries (ARL) survey of special collections libraries, about 28 percent of manuscript collections are unprocessed, while 36 percent of graphic materials and 37 percent of audio materials have not been processed (Pantich 2001). Furthermore, the survey found that “the most frequent type of available access is through card catalog records or manual finding aids,” which suggests that researchers often must be physically present at special collections and archives to know what they hold (Pantich 2001, 8). As the ARL Task Force on Special Collections argues, the failure to process collections holds back research, leads to duplicates being purchased, and makes them more vulnerable to being stolen or lost because libraries and archives don’t know what they have. Studies have shown that between 25 percent and 30 percent of researchers have not been able to use collections because they have not been processed (Greene and Meissner 2005, 211). As a result, stakeholders such as researchers and donors become frustrated. Indeed, in a much-discussed article, Greene and Meissner report that “at 51% of repositories, researchers, donors, and/or resource allocators had become upset because of backlogs” (2005, 212).
To confront the problem of unprocessed collections, Greene and Meissner promote “a new set of arrangement, preservation, and description guidelines that (1) expedites getting collection materials into the hands of users; (2) assures arrangement of materials adequate to user needs; (3) takes the minimal steps necessary to physically preserve collection materials; and (4) describes materials sufficient to promote use” (2005, 212-213). Meeting researchers’ needs for access to materials trumps achieving perfection in archival description and arrangement. Likewise, the ARL Task Force proposes minimal processing, suggesting that “it is better to provide some level of access to all materials, than to provide comprehensive access to some materials and no access at all to others” (Jones 2003, 5). This access can be provided through the Online Public Access Catalog (OPAC) EAD finding aids, digital collections, or databases. Indeed, providing electronic access is crucial to making hidden collections more visible, since “increasingly, materials that are electronically inaccessible are simply not used” (Jones 2003, 5). Thus, the Library of Congress Working Group on the Future of Bibliographic Control recommends that archives “make finding aids accessible via online catalogs and available on the Internet,” streamline cataloging, and “encourage inter-institutional collaboration for sharing metadata records and authority records for rare and unique materials” (2008, 23-24).
Among the criteria that archives and special collections should consider in determining how to process each collection are size, condition, significance, and, perhaps most important, the needs of researchers. Archives should keep in mind that archival descriptions may be part of distributed, federated catalogs, so they should adhere to best practices to ensure consistency of data. The ARL Task Force recognizes that some collections may require more detailed description than others and that any decision will involve trade-offs. As one drafter of the ARL Task Force Report observed, “Collection-level cataloging is potentially dangerous because if not done right, it will merely convert materials from ‘unprocessed’ to ‘hidden'”(Jones 2003, 9-10).
Institutions have devised different approaches to hidden collections based on the nature of their collections and the resources available. Through the University of Chicago’s Andrew W. Mellon Foundation-funded “Uncovering New Chicago Archives Project” (UNCAP), graduate students are working with scholars and cultural heritage professionals to catalog hidden collections housed at a local library and museum (Shreyer 2007). For the museum collection, they are using item-level cataloging, whereas they are using more standard archival practices with the library collection. In addition, a professional archivist is using minimal processing techniques to process a jazz collection and a contemporary poetry collection housed at the university. Whereas the students are producing detailed descriptions, the archivist is taking a more stripped-down approach, allowing Chicago to test the effectiveness of each model. Similarly, to reduce archival backlogs and provide research experiences for graduate students, the University of California, Los Angeles (UCLA) launched the Center for Primary Research and Training (CFPRT), which “pairs graduate students with unprocessed or underprocessed collections in their areas of interest and trains them in archival methods, resulting in processed collections for us and dissertation, thesis, or research topics for them” (Steele 2008). UCLA develops a plan for processing each collection and uses an online calculator to estimate costs.
3. The Role of Software in Addressing Hidden Collections
Reducing archival backlogs fundamentally requires adopting more-efficient means of processing collections, but software can contribute to that efficiency and make it easier for archives to provide online access to archival descriptions. At many archives, information is scattered across several different digital and physical systems, resulting in duplication of effort and difficulty in locating needed information. For instance, one archive uses a hodgepodge of methods to manage its collections, including paper accession records; an Access database for collection-level status information; lists and databases for tracking statistics; hundreds of EAD finding aids; hundreds of paper control folders providing collection-level information, some of which is duplicated in Word files or in XML finding aids; and item-level descriptions of objects to be digitized in Excel spreadsheets. This miscellany means that there are problems with versioning, redundancy, finding information, and making that information publicly accessible. Likewise, Chris Prom found that many archives are using a variety of tools at various steps in their workflows, so much so that “their descriptive workflows would make good subjects for a Rube Goldberg cartoon.” Examples include the Integrated Library System (ILS) for the creation of MARC records, NoteTab and XMetaL for authoring finding aids, Access for managing accessions, Word for creating container lists, and DynaWeb for serving up finding aids (Prom 2008, 27). (See Appendix 1 of this paper for a more detailed description of the archival workflow.)
In addition to the inefficiencies of using multiple systems to manage common data, Prom et al. (2007, 158-159) notes a correlation between using EAD and other descriptive standards with larger backlogs and slower processing speeds. (EAD is an XML-based standard for representing archival finding aids, which describe archival collections.) Some institutions simply lack the ability to produce EAD finding aids or MARC catalog records. As Prom et al. suggest, “Until creating an on-line finding aid and sharing it with appropriate content aggregators is as easy as using a word processor, the archival profession is unlikely to significantly improve access to the totality of records and papers stored in a repository” (2007, 159). One of the ARL Task Force on Special Collections’ recommendations thus focuses on developing usable tools to describe and catalog archival collections: “Since not all institutions are currently employing applicable national standards, the development of easy-to-use tools for file encoding and cataloging emerge as a priority. These tools should be simple enough to be used by students or paraprofessionals working under the supervision of librarians or archivists” (Jones 2003, 11). Greene and Meissner (2005, 242) suggest that software can play a vital role in streamlining archival workflows by enabling archivists to describe the intellectual arrangement of a collection without investing the time to organize it physically. In 2003, Carol Mandel observed that “I also have been told again and again that we really don’t have software for managing special collections. We don’t have the equivalent of your core bibliographic system that helps you bring things in and move them around efficiently and know what you are doing with them” (Mandel 2004, 112).
Fortunately, powerful software for managing special collections and archives is emerging. This report is more a sampling of leading archival management systems that offer English-language user interfaces than a comprehensive examination of every potentially relevant application.1 Of course, software itself cannot solve the problem of hidden collections; what matters is how software is used and incorporated into streamlined, effective workflows. Although archival management systems such as Archon and Archivists’ Toolkit can play an important role in facilitating the production of EAD and MARC records and streamlining archival workflows, Prom, a developer of Archon, cautions that “archivists should not treat them as magic bullets. They will only prove to be effective in encouraging processing and descriptive efficiency if they are implemented as part of a strategic management effort to reformulate processing policies, processes, procedures” (Prom 2008, 32).2
In conversations with archivists, I asked what their dream software would be as a way of identifying what features would be most important to them and envisioning what may be possible. They often responded that they liked the software applications they were currently using, but would add a few features. The responses point out some of the strengths of existing software and future directions for software developers. Through conversations with archivists and a review of existing research, I’ve identified the following desired features for archival management systems.3
- Integrated: Rather than having to enter data in multiple databases, an archivist could enter the data once and generate multiple outputs, such as an accession list, EAD finding aids, a MARC record, a shelf list, and an online exhibit. As one archivist remarked, “The ideal approach to minimal processing is that you touch everything only once. Every time you touch it is more staff time.”
- Supports importing legacy data: Many archives have already invested a great deal of time in creating EAD finding aids. Likewise, they want an easy way to import other data, such as accessions information. They want software that will seamlessly import existing data—which can be a challenge, given the variability of EAD documents and other forms of archival data.
- Enables easy exporting of data: Given how quickly software becomes obsolete, archivists recognize the need for being able to export data cleanly and easily. One archivist commented, “Archival material is so specific that you don’t want to get locked into anything… Ideally, I would want something that would also preserve that information in a format that is able to migrate if needed.”
- Provides Web-publishing capabilities: Many archives lack the ability to make their finding aids available on-line. By providing a Web-publishing component, an archival management system would enable archives to provide wider access to their collections. Through on-line access, archives have found that they become more visible. As Victoria Steele (2008) writes, “As new finding aids become viewable online, we have seen, over and over, that researchers are at our door to consult the collections they describe. But it must be said that a consequence of our success has been that staff whose primary focus was the processing of collections are now almost wholly engaged in handling reader requests, reference inquiries, and licensing agreements—leaving them almost no time for processing.”
- Simple yet powerful: Archivists want software that is “as easy to use as Word but transforms to the Web and generates EAD at the click of a button.” Students and paraprofessionals without strong archival training need software that provides simple templates for entering data, so that they know what information goes where. (Clear user guides can also assist in ensuring the quality and consistency of data). If software is too complex or cumbersome to use, much time will be lost. The software should be flexible enough to adapt to the archive’s existing workflow.
- Rigorous, standards-based: The archival community has embraced standards such as EAD, Describing Archives: A Content Standard (DACS), and Encoded Archival Context (EAC), and archivists want software that ensures conformity to these standards. The potential for inconsistent, incorrect data increases as more people participate in describing archival collections. Archival management systems can reduce the likelihood of error by ensuring that data are entered according to standard archival practices (for instance, making sure that dates are in the proper format).
- Provides collection management features. Archivists want software that helps them manage and track their operations more efficiently. Several interviewees wanted to be able to track reference statistics, while others would like to generate temporary records and track locations.
- Portable: Archivists often work in environments where they do not have access to a desktop computer or even to the network, such as the home of a donor or a room in a small museum. As a result, they may begin collecting data using offline software such as spreadsheets. Once they return to their offices, they have to redo much of the work to make it fit into their existing systems. According to one archivist, “It would be useful if we could begin processing on-site, where we first encounter the material. We have to begin again each time we start a new stage.” Archival software could thus support offline data entry, allowing archivists to enter data into a laptop and then upload it into an archival management system once they have network connectivity. Perhaps archival management system could also support data entry through mobile, wireless device such as iPhones).4
- Aids in setting priorities for processing: Some archival management systems enable archives to record which collections are higher priorities, thus allowing archivists to plan processing more effectively. In defining approaches to hidden collections, the ARL Special Collections Task Force put forward several recommendations that involve using tools and measures to assess processing priorities. Two of these recommendations are “Develop qualitative and quantitative measures for the evaluation of special collections” and “Support collection mapping to reveal the existence of special collections strengths and gaps, as well as to identify hidden collections” (ARL 2006). Such tools are outside the scope of this report, but it is important to acknowledge the role of related technologies. Examples of tools and protocols that can be used to assess collections and prioritize processing include the Philadelphia Area Consortium of Special Collections Libraries (PACSCL) Consortial Survey Initiative,5 OCLC’s WorldCat collection analysis tool,6 the University of California, Berkeley’s survey tool,7 and Columbia University’s Mellon Survey database.8 In some cases, such as with the PACSCL FileMaker database, the information collected through these survey tools can be used as the basis for accessions databases and for DACS-compliant EAD or MARC records (Di Bella 2007).
FOOTNOTES FOR SECTION 3
1 Archival/collection management and description software that go beyond the scope of this report include Andornot Archives Online, ARGUS/Questor, Collections MOSAiC Plus, CollectionSpace, Embark, Filemaker Pro, HERA2, IDEA, KE EMu, Microsoft Access, Mimsy xg, Minaret, Re:discovery, and VernonSystems Collection. Integrated Digital Special Collections (INDI), currently under development at Brigham Young University, is geared toward large archives or consortia and aims to support a distributed workflow for archival description and management. The accessions and appraisal modules have already been released, but as of August 2008 the future direction of the project was still being determined.
2 How to efficiently manage archives is beyond the scope of this report, but Greene and Meissner 2005 and Prom 2007 take up the issue in detail.
3 Many of these desired features jibe with Archivists’ Toolkit’s (AT) recent survey of 171 users investigating what new features they most desire. The most popular options included “Search improvements” (average of 4.04 out of 5, with 5 being “very important”); “Enable batch editing/ global updating,” (4.31); “Web publishing of AT data” (4.2); “Digital objects record revision,” which would include support for technical metadata, visual metadata, and independent digital objects (3.97); and a “Use tracking module,” which would provide “Support for tracking and reporting the use of a repository’s collection” (3.86). See AT User Group Survey Results: Proposed New Features and Functionality at http://www.archiviststoolkit.org/AT%20User%20Group%20SurveyResultsFD.pdf.
4 Some tools already provide support for offline editing or data creation through a handheld device. For example, PastPerfect’s Scatter/Gather module allows archives to enter information offline through a desktop client, then create a transfer file that is merged with the main data. MINISIS also supports data entry through mobile devices.
4. Research Method
In compiling this report, I relied on the following sources:
- Archival management system reviews produced by other groups, including Fondren Library’s Woodson Research Center (2008), Archivists’ Toolkit (2008), the International Council on Archives (Lake, Loiselle, and Wall 2003), the International Council on Archives-Access to Memory (ICA-AtoM) (Mugie 2008), and the Canadian Heritage Information Network (CHIN 2003).9 These reviews tend to focus on available features rather than performance.
- Information provided by software developers and vendors on their Web sites and through other documentation.
- Phone interviews with users and developers of archival management systems.10 By talking to users of different archival management systems, I was able to get a detailed view of their strengths and weaknesses. Unfortunately, I was able to arrange interviews only with users of AT, Archon, Cuadra/STAR Archives, CollectiveAccess, and Eloquent, so the analysis of the other software is based on what the developers say about it rather than on user experience. I also spoke and/or corresponded with representatives from AT, Archon, Cuadra/STAR, CollectiveAccess, ICA-AToM, Minisis, Adlib, CALM, PastPerfect, and Eloquent. I briefly experimented with demo versions of CollectiveAccess, Archon, and AT, and I saw demos of Cuadra/STAR and Eloquent.
To ensure accuracy and fairness, developers and vendors were given the opportunity to respond to user comments and to the features matrices that I developed (see Appendixes 2-4).
FOOTNOTES FOR SECTION 4
9 See also Collections Trust 2008 and Stevens 2008.
10 Interviews were conducted between May and July 2008. The names of interviewees are kept anonymous. I tried to represent what interviewees said as accurately as possible, but occasionally quotations contain paraphrases or supplied words.
5. How to Select Archival Management Software
With an increasing number of options for archival management software, archivists may feel overwhelmed. Fortunately, they can adopt sound, rational processes for selecting software. The Canadian Heritage Information Network (CHIN) offers both a detailed review and an online course focused on selecting collection management software, which is closely related to archival management software (CHIN 2003). (While collection management systems typically support cataloging, managing, and making available archival, museum, and private collections, archival management systems include many of these features but focus on the particular needs of archives, such as archival description and conformance to archival standards.) Rather than replicate that work, I will provide a few general recommendations for selecting software based on the CHIN guide and other sources.11
Selecting software should be a collaborative process so that all the stakeholders (archivists, technical staff, administration, researchers, etc.) can describe how they would use it and provide input into what is selected. To ensure that the selection process stays on course, the team should establish a project plan with clear milestones and areas of responsibility. As a first step, archives should conduct a needs assessment to evaluate current gaps and workflows. Do they really need new software, and is now the best time (given available resources, current projects, etc.) to pursue it? What are the weaknesses of their current software? How does information flow through the system? What kind of information is captured, by whom, when, and for what purposes? What workflows do archives want to change—and retain? What is the desired outcome of adopting new software? Answering these questions will help organizations define their requirements.
Working collaboratively, team members should then prioritize requirements, generating a weighted “features checklist.”12 In addition to features such as “support for EAD” or “support for managing locations,” archivists should weigh factors such as the quality of user support, the reputation of the vendor, cost, technical requirements, and the robustness and appropriateness of the technology platform. Often the best way to evaluate the quality of the software and support is to speak with a variety of users (both those recommended by the vendors and those who are independently identified). Through a site visit, evaluators can see the software in action and understand it in the context of archival workflows. Most vendors, and all open source projects, make available a demo version or can arrange an online demonstration of the software. Archivists should take the software through a variety of tasks to determine whether it is easy to use, does what it needs to do, and has any bugs. If a commercial application is selected, organizations should carefully spell out the terms of the contract, including support and training. They should also develop a maintenance plan for regular updates, training, and so forth. If an open source application is selected, archives should likewise determine how staff will be trained and how the technology will be kept up-to-date.
FOOTNOTES FOR SECTION 5
11 For more guidance on selecting software, see Dewhurst 2001; TASI 2007; and Baron 1991.
12 For detailed, if slightly out-of-date, requirements for archival and collection management software from an international perspective, see Groot, Horsman, and Mildren 2003.
6. Criteria for Choosing Archival Software
No single archival management system will be appropriate for every archive, given the variation in technical support available at the institution and the need for particular features. Comparing archival management systems yields several key factors that distinguish them from each other. Here are some of the criteria that archives should consider in selecting an archival management system:
- Automating the processing and description of collections through the archival management system versus generating EAD by hand and managing collections through other software
Archival management systems offer a number of advantages, particularly to archives that do not already have large quantities of EAD finding aids or are dissatisfied with current workflows. A primary advantage of archival management systems is the ability to enter data once and generate multiple outputs. Rather than being isolated in separate systems, data can be brought together through a single interface, reducing redundancy and making it easier to find and manage information. Instead of having to understand the intricacies of EAD and XML markup, archivists, paraprofessionals, and student workers can create a valid EAD finding aid by entering information through a series of Web- or desktop-based forms, saving time and producing more consistent finding aids. Some archival management systems also enable organizations to publish their finding aids on the Web, thus making archival information more widely available.However, archival management systems can be difficult to implement in some organizations and may not provide the flexibility that archivists require. Several archivists reported difficulty importing existing EAD data into systems such as Archon and AT, a problem due in part to the flexibility of EAD and the resulting variability of finding aids. Although archival management systems typically can be customized and feature user-defined fields, they do enforce a certain consistency and workflow, which frustrates archivists who have an established way of working. As one archivist stated, “Archon and Archivists’ Toolkit are great, but it means that someone else has done the thinking for you about the workflow.” Homegrown approaches may be more flexible and may better reflect the archive’s own workflow. Furthermore, some archivists argue that putting archival description into a database structure is reductive and oversimplifies the process of producing a finding aid. In the process of encoding a finding aid, archivists better understand the texture, structure, and contents of the document. Also, XML and word processing editors provide greater flexibility than databases. As an archivist noted, “If we are doing rearranging while we’re going along, we can’t shift things around very easily if we’re using a database. We have parts of finding aids that we can shift around in Word. …The tool has to combine flexibility with rigor.”Other archivists emphasize the importance of adhering to standards to facilitate exchange of information and consistency. As one user of an archival management system noted, “We could have customized things to meet past practice, but we also decided to move away from old practices. We don’t want to be too flexible any more.” Katherine Stefko (2007) acknowledges the trade-offs in sacrificing flexibility for consistency: “To use the AT effectively implies a commitment to using current professional standards, and while it would be hard to argue anything other than this being a good thing, it undeniably raises the bar in terms of the time, training, and expertise an archivist needs in order to use it. … Accordingly, we’ve redirected staff time and modified our workflow so that more time is now spent accessioning material, with the understanding that retrieval and reporting will [be] easier and reference and administrative work less later on.” Indeed, one interviewee argued that the rigor and inflexibility of archival management systems are actually strengths, since by using such software, archives will ultimately produce more consistent data and facilitate the exchange and federation of archival information. If each archive, or even each collection, took its own approach to archival description, creating a federated finding aids repository would be difficult. In that sense, the development of archival management systems such as Archon and AT is an important step toward realizing the ARL Task Force on Special Collection’s recommendation: “Since not all institutions are currently employing applicable national standards, the development of easy-to-use tools for file encoding and cataloging emerges as a priority. These tools should be simple enough to be used by students or paraprofessionals working under the supervision of librarians or archivists” (Jones 2003, 11).
- Open source versus commercial
Perhaps the most fundamental choice that archives will make is whether to select an open source or a commercial system. Increasingly, governmental and educational organizations are embracing open source software. For instance, the European Commission has endorsed open source software because it offers a greater diversity of solutions, improves the development process through community input, offers faster deployment through customizability, and leads to enhanced technical skills of IT staff (OSOR.EU 2008). According to OSS Watch, a service funded by JISC, open source offers many advantages: it facilitates rapid bug fixing, is typically more secure, enables customization, supports internationalization, and protects against vendor lock-in or the collapse of the vendors (Wilson 2007). In addition, open source software is typically free, flexible, and continually evolving—assuming an active development community (Lakhan and Jhunjhunwala 2008). Open source software is often supported on or portable to a number of platforms (Office of Government Commerce 2002, 3). Although some worry about the sustainability of open source projects, other developers can maintain and enhance the code should the original developer abandon the project; indeed, as Stuart Yeates from the JISC’s OSS Watch argues, “Sustainability is an issue for proprietary software as much as for open source software” (Smart 2005). Many believe that open source software is actually more secure than proprietary software, since open source applications can be scrutinized and verified by “many eyes” and security issues can be resolved quickly (Whitlock 2001).Some institutions, however, lack the technical staff to implement open source software. Others may oppose it because of they fear security risks or high maintenance costs. Implementing open source software can be challenging, particularly if no support is available or if support structures vanish. With commercial software, customers can contact the vendor for training, assistance in importing data, or other services; with open source software, archives often rely on the community for help. Sometimes open source projects are abandoned before reaching fruition (Lakhan and Jhunjhunwala 2008). Documentation of open source applications can be weak (Office of Government Commerce 2002, 4). Although open source software typically is available without licensing fees, significant costs can result from implementing and customizing it at a local institution. Studies comparing the total cost of ownership of open source versus proprietary software have produced conflicting findings. Each organization should consider what it costs to switch software and what the total cost of adopting the software, including staffing and hardware, will be (Ven, Verelst, and Mannaert 2008, 55-56). Organizations should also consider the maturity of the software, including its functionality as well as support, training, and documentation (Wilson 2006).
- Hosted by company or local institution
Some institutions lack the technical infrastructure to install and maintain an archival management system themselves. Many companies will host software for organizations, enabling archives to focus on their core work. In addition to hosting, many companies will assist customers in importing legacy data into the software. Generally, customers who pay a company to host their data reported that there were few technical problems and that the company’s servers rarely went down. One archivist felt relieved that a company in another part of the country was hosting and backing up her data, since her institution is in an area vulnerable to hurricanes.Although hosted solutions offer noteworthy conveniences and efficiencies, one archivist voiced her frustration that she felt that she was in less control of her data and the way they were presented. If the data were hosted locally, she could play around with the user interface rather than having to rely on the company to make requested changes. Indeed, some institutions feel uncomfortable relying on anyone but themselves to curate their data. What will happen to an archive’s data if the company fails? How will the archive retrieve that data, and in what format? Archives should also consider the annual costs of a hosted solution, although hosting data locally also entails costs in hardware, technical support, licensing fees, etc. Commercial vendors typically provide hosting services, although some service bureaus will also host open source software (for instance, hosting is being planned for ICA-AToM). If organizations are considering a hosted solution because they fear the complexity of installing and maintaining software, they should note that most archival management systems are designed to be easy to install and maintain.
For many institutions, cost is a key factor in determining what software to select. The purchase cost for archival management software can range from free (for open source) to hundreds of thousands of dollars (for commercial products with all the bells and whistles and licenses for many clients). Even open source software entails significant costs, including hardware, technical support, and customization—costs that also apply to commercial projects. Along with the cost of the license, archivists should factor in recurring costs, such as maintenance fees, user support, training, hardware, technical support, and customization. Several interviewees noted that companies were willing to “work with us” to find an appropriate cost and that smaller institutions often benefited from a price break. As one might expect, more-expensive products often come with more features. Archives must decide which features are essential.
Software comes and goes, and archivists are rightly concerned about their data being locked into a closed system. If a company collapses or ends support for a product, how will that affect archives who rely on it? Open source projects seem to offer some advantages for sustainability, since other programmers can continue to maintain and develop open source software should the original developer abandon it. However, some open source projects fade away after an initial burst of development activity, and archives, already stretched thin, may not have the technical resources to pick up development work. Nevertheless, open source projects such as AT and ICA-AToM are developing detailed business plans to ensure sustainability, looking at ways to charge fees for training and other services, offer membership, and affiliate with stable organizations that can offer support for the software. Adapting the open source model, some companies allow customers to buy in to escrow plans that will provide them with the code should the company end its support of a product. In any case, to make sure that their data can be used for the long term, archives should make sure that they can easily batch export the data in standard formats.
- Quality of customer support
Inevitably, archivists will run into problems using archival management software, whether because of bugs, difficulty importing data, the need to customize certain features, confusion over how to use the software, or technical problems. Thus, they rely on good customer support from vendors or, in the case of open source software, the developers and user community. Many interviewees mentioned user support as a key factor in their satisfaction with a particular software package. Vendors typically provide assistance via phone or e-mail, user forums, frequently asked questions, and user training. In some cases, help is included in annual maintenance fees, but in others it entails additional costs. Open source projects may seem to be weaker than commercial projects with regard to user support. As one archivist using an open source system commented, “There’s no help desk.” However, lively communities often form around open source projects and provide support to new users or those experiencing problems. With Archon, CollectiveAccess, and Archivists’ Toolkit, archivists noted how responsive the developers are to questions. In addition, support for open source software may be available from consultancies or even the developers themselves. For example, the business plan for ICA AToM includes a provision for “charging a commission for brokering ICA-AtoM technical services between recommended third-party contractors and institutions seeking assistance with ICA-AtoM installation, hosting, customization, new feature development, etc.” To evaluate user support, talk to users of different software packages.
- Support for archival standards
To facilitate interoperability and adherence to best practices, archives will want to select software that meets archival standards such as EAD, DACS, and MARC, as well as emerging standards such as EAC. Some archival systems, such as ICA-AToM, focus more on international (ICA) standards rather than on U.S. standards. In the case of archival software developed in Europe, Prom et al. warn that “such tools use a much more rigorous system of classification and provenance than do US repositories” (Prom et al. 2007, 159). However, even many non-U.S. applications support crosswalking between standards and include EAD support.
- Web-based versus desktop client
Some archival management software (such as Archon, CollectiveAccess, and ICA-AToM) is entirely Web based, while other such software requires a desktop client (typically a PC) and connect to a database backend. Web-based software can be more intuitive for some users and enables distributed cataloging, since anyone with Web access can contribute records. With systems such as Archon, information can be published to the Web as soon as it is entered. However, some archives worry about the security and reliability of an entirely Web-based system; one archivist noted her colleagues’ reluctance to “put all of our eggs in one basket.” If the Internet connection goes down, work stops (which is also true of networked client/server software). A client-based interface may offer greater control over data, but institutions may need to pay a fee for each computer on which the software is installed. Licensing models vary, however, so this is not always the case.
- Support for publishing finding aids online versus generating EAD for export
Many archives face difficulty not only in creating EAD files but also in publishing them online. As one archivist remarked, “There’s been a big hole—people have been producing EAD for 10 years, but it’s still kind of difficult.” Some archival management systems address this problem by enabling archives to make available their finding aids on the Web. Indeed, a primary reason that Archon was developed was to facilitate publication of archival information online. Once an archivist enters information into Archon, it is automatically searchable and discoverable by Google (although archives can choose to defer publication of records until they have been approved). Likewise, many commercial systems offer support for online access to their collections, sometimes through the purchase of an additional module. However, some archives already have a mechanism for publishing their finding aids on the Web, so they may prefer software that enables them to easily export finding aids that they can then import into their existing Web-publication system. Since most browsers now provide support for XML, archives could simply upload their EAD files to a Web server, include a call-out to an XSLT stylesheet at the top of each file for the purposes of presentation, and display their finding aids without too much effort. Projects such as the EAD Cookbook have made stylesheets freely available. Although this simple approach does not offer sophisticated searching and other features, it enables archives to publish their finding aids online at minimal cost.If archival management software does enable publishing archival collections online, archives should consider the quality and customizability of the end-user interface. Does it provide search and browse functions? Can users run advanced searches? Does it offer additional features, such as stored searches? Is the design clean and simple to navigate? Can it be easily customized to reflect the unique identity of the archive? Does the interface meet accessibility standards? Can it be translated into other languages?
- Support for linking to digital objects
In addition to providing access to archival collections, archives may wish to make available digital surrogates of items, such as images, texts, audio files, or video. Many archival management systems offer a “digital library” or “online exhibit” function to provide Web-based access to items in their collections. In evaluating these features, archives should consider what kind of media and metadata formats they support as well as how media are presented. For instance, CollectiveAccess has rich features for media support, including the automatic generation of MP3s upon loading an audio file to the server, an image viewer with pan and zoom, and the ability to mark time codes within video files. However, some archives may want to use a separate digital asset management system (DAM), such as ContentDM, DSpace, or Fedora, to provide online access to their collections, since they are using these robust systems for other digital collections. These institutions will want an easy way to batch export relevant metadata from their archival management system or, even better, a way to plug in their archival management system to their DAM. (ICA-AToM plans to use a plug-in architecture for exposing the application to Web services or allowing it to interface with other Web services, such as DSpace or Fedora.)
- Support for collection management
Some systems offer robust support for managing archival collections, including appraisals, locations, condition and conservation, and rights and restrictions. Some even allow users to create deeds of gift and location labels, track usage statistics, and manage requests for materials and reference help. Others focus more on archival description than on collection management. Many do both. Archives should determine what features are most essential to them, while noting that new versions of software often add features that they may desire.
- Reports, statistics, and project management
Some software can enable institutions to run reports to, for example, track unprocessed collections or determine what is stored in a particular location. How easy is it to create and print out such reports? Through archival management software, organizations may also be able to track statistics such as the size of various collections, how many linear feet have been processed or deaccessioned over a year, and the most frequently requested collections.13 Such statistics can help archives determine how to set processing priorities and can be valuable in reporting to organizations such as ARL. Indeed, some software even allows institutions to mark accessions that are high priority for processing, helping them manage hidden collections.
- Reliability and maturity
Some archives are shying away from software that is still in development such as Archivists’ Toolkit and Archon because “there are still bug reports.” Users did report that there were some bugs or missing features for both tools, as well as for commercial systems. However, they also said that their error reports were taken seriously and that the development teams are responsive to user questions and suggestions. In the contemporary computing environment, software is continually evolving; witness the “permanent beta” status of Web 2.0 tools such as Google Documents. It is possible for software to be too mature, built using out-of-date technologies or approaches. On the other hand, some software never makes it out of beta or may not go in the direction anticipated, so institutions may lose time and resources if they adopt untested software.
FOOTNOTES FOR SECTION 6
13 The University of Michigan is developing archival metrics: http://www.si.umich.edu/ArchivalMetrics/
7. Types of Software
In 2005, Katherine Wisser reported on an EAD Tools Survey that revealed the diversity of ways in which archives created finding aids and the difficulty that smaller institutions in particular had in authoring and publishing EAD. Wisser divided EAD tools into four categories: authoring, publishing, discovery (search tools), and knowledge (best practice guides). One of the most used tools at the time was the EAD Cookbook, which provides a set of templates, stylesheets, and guidelines for creating finding aids. Wisser found a disparity in the kinds of tools institutions used: archivists at smaller archives tended to rely upon the EAD Cookbook, while those at larger institutions often developed their own solutions. Some institutions were willing to share those solutions, with the caveat that they reflected local practices.
More recently, open source archival management systems such as Archon and AT and commercial solutions such as Cuadra STAR and MINISIS have offered other methods for creating archival description. The promise of such systems is that archivists no longer have to hand-code EAD, but can create it through entering information into database fields. Rather than keeping archival data in multiple systems, archivists can manage, search, and manipulate data through a single interface. However, such systems can also enforce a rigor that may challenge existing workflows, and importing legacy data into them can be difficult.
Below I briefly describe a range of archival software packages that support exporting or publishing EAD and MARC or are likely to do so soon. Since the focus of this report is archival management systems, only brief descriptions of more specialized EAD authoring and publishing tools are provided, and no information is offered about digital asset management systems, institutional repository software, integrated library systems, or digital collections software.14 Appendix 2 summarizes the features of archival management systems in brief, while Appendix 3 offers a detailed summary of these features. Appendix 4 presents summaries of my interviews with current users of several leading archival management systems.
According to a 2006 study by Chris Prom, archivists use a variety of tools to create descriptive records, favoring “simple” tools: “Eighty-two percent use word processors; 55%, library catalog software; 34% custom databases; 31% text or HTML editors; 22% XML editors, and 14% digital library software” (Prom 2008, 21). Archives using XML editors typically have a larger backlog (58% of the collection) than those using word processors (37%), leading Prom to suggest that “[a]t least some of our backlog problems seem attributable to the adoption of complex tools and methodologies” (2008, 22). However, these institutions may have had larger backlogs to begin with. Prom found a low adoption rate of MARC and EAD—access to only an average of 37 percent of collections is provided through MARC, 13 percent through EAD (2008, 23-24).
Often archives use a mix of methods to create finding aids. For instance, UC Berkeley converted legacy finding aids to EAD through a multifaceted approach, entering basic descriptive information into Web templates (http://www.cdlib.org/inside/projects/oac/toolkit/templates/) and employing WordPerfect to create the initial hierarchy for the collection. It then converted the WordPerfect files to EAD using macros and Perl scripts (http://www.cdlib.org/inside/projects/oac/toolkit/). XML editors were primarily used as “reference tool[s],” since “[i]t is far faster to programmatically convert text to EAD in broad strokes than to apply the copy and paste method required when using these editors” (Digital Publishing Group, UC Berkeley Library, n.d.). Likewise, the University of Chicago uses Web forms to create the front matter for finding aids; archivists write inventories using Word, and then a script is run to generate EAD. Post-processing is done using an XML editor such as Oxygen. According to archivists at the University of Chicago, such an approach “provides the archivist with a lot of flexibility.”
Among the particular technologies used to create EAD are the following:
A. XML/text editors
XML editors enable archivists to see the entire hierarchy of a finding aid and engage in the intellectual activity of marking up an archival collection.15 As one archivist noted, “The act of writing a finding aid is something where you need to be able to view contents as you write series description. Creating finding aids is not data entry, but an intelligent process. I think that encoding EAD helps you to write finding aids, to understand the texture of a document.” However, relying solely on XML editors to generate finding aids can be inefficient. According to “informal studies” at the University of Illinois-Urbana Champaign, “a skilled worker took 20 hours to encode a 100-page finding aid, using standard XML markup tools, on top of the time needed to actually write the collection description and develop a general box listing of its content” (Prom et al. 2007, 159).
XML and customizable text editors include:
- XMetaL:16 Extensible, collaborative commercial software for authoring XML. To provide a more user-friendly interface for creating and editing finding aids, Yale University has developed a finding aids authoring tool layered over XMetaL. Yale’s FACT tool customizes XMetaL to provide a “word processing” view of finding aids for staff who didn’t want to work with the XML elements. Archives such as the University of Minnesota have developed tips for using XMetaL to author EAD.17
- Oxygen:18 Easy-to-use, commercial “cross platform XML editor providing the tools for XML authoring, XML conversion, XML Schema, DTD, Relax NG and Schematron development, XPath, XSLT,” etc. Several archives and consortia, including Northwest Digital Archives, provide documentation for using Oxygen to create EAD.19
- NoteTab: A free or inexpensive text editor. Several projects, including NC Echo,20 Virginia Heritage,21 and the EAD Cookbook,22 have created clipbook libraries for NoteTab that facilitate the creation of EAD. According to a recent report by the Florida Center for Library Automation (FCLA), “the existing, customizable NoteTab templates maintained by FCLA have been very helpful for many organizations wishing to create EAD-encoded finding aids” (Florida Center for Library Automation 2008).
- EAD Cookbook: The EAD Cookbook aims to make it easier for archives to create finding aids by providing authoring tools for Oxygen, XMetaL, and NoteTab. In addition, it offers a set of stylesheets for transforming XML finding aids into HTML and detailed guidance on creating and publishing EAD finding aids.
- MEX (Midosa-Editor in XML-Standards): Describes itself as “a set of tools for everyday description work in archival institutions including the production of online finding aids with digitized images from the archival records.”23 An open source application developed by the Federal Archives of Germany with support from The Andrew W. Mellon Foundation, MEX enables archivists to create, import, and edit EAD finding aids; attach digital objects; examine an entire XML file or a single element; create online presentations of finding aids; and provide both search and structured browsing. It is a plug-in to Eclipse, an open source Java development platform.
B. Word processing templates
A number of archives use or have used word processing software such as Microsoft Word, WordPerfect, or Open Office to create preliminary finding aids. In some cases, organizations have created templates that make it easy to enter standard archival information. Often they also use macros or scripts to aid in the conversion to EAD. For example, Yale has experimented with Open Office as tool for EAD creation (Yale University Library 2003), the Bentley Library at the University of Michigan has developed macros to convert Word files to EAD XML (Bentley Historical Library, n. d.), and the Utah State Archives used WordPerfect to create container lists (Utah State Archives 2002). Similarly, the Utah State Archives produces container lists using Excel and MailMerge (Perkes 2008).
By using forms to produce finding aids, archives can speed their creation and ensure greater consistency. Forms can be Web based or desktop based:
- Berkeley Web Template: CGI script is a customizable cgi-driven Web application “that generates a user-defined HTML form template and then generates markup using the values filled in by users. … Output may be in the form of METS, TEI, EAD, XML or SGML, even HTML or PDF” (University of California, Berkeley 2005).
- Online Archive of California: Makes available Web forms “for generating collection- through series-/subseries-level finding aids that are compliant with the OAC BPG EAD and EAD Version 2002. Encoders cut and paste segments of their non-EAD finding aids into the form. The form is then converted to a text file and saved as a XML EAD file.”24
- ArchivesHub: Provides a Web form for generating EAD 2002.25
- EAD XForms: Justin Banks’s EAD templates allow users to enter archival information into a form. The templates were built using Altova’s StyleVision2006 and require an XML editor such as Altova Authentic2006 or Altova XMLSpy to implement.26
- X-EAD: The University of Utah is developing form-based desktop software for authoring and editing EAD.27
By validating EAD files, archives can ensure their adherence to standards and facilitate participation in union catalogs and regional repositories. Several online validation services are available, including the following:
- Florida Center for Library Automation’s Encoded Archival Description Validator and XSL Transformer: A Web page that was “created for museums, archives, libraries, historical societies, and similar agencies in Florida who create collection finding aids (guides) according to the Encoded Archival Description (EAD) standard, version 2002. The tools on this page permit EAD creators to a) validate (test) their EAD documents against the rules described in the EAD Document Type Definition maintained by the Library of Congress, b) generate a HTML version of their finding aid from the original EAD encoding, using a XSL stylesheet maintained for the ARCHIVES FLORIDA database, and c) derive Dublin Core metadata records from their original EAD documents.”28
- RLG EAD Report Card: “The first automated program for checking the quality of your EAD encoding.”29
As several interviewees noted, publishing EAD finding aids online presents a real challenge, especially to smaller archives without much technical support. Finding aids can be converted to HTML and placed on a Web server or loaded into an XML-database/publishing system—operations that are beyond the capabilities of many archives. Alternatively, archives can upload the XML file, include a call-out to an XSLT stylesheet, and use the browser to transform XML to HTML. Some archives deposit their finding aids with a regional repository such as Online Archive of California (OAC), Texas Archival Resources Online (TARO), or North Carolina ECHO, and/or with an international repository such as OCLC’s Archives Grid. Other archives have adopted XML publishing platforms that allow searching and presentation of finding aids, an approach that requires much more technical support but also provides greater control over data. These publishing platforms include:
- PLEADE: “PLEADE is an open source search engine and browser for archival finding aids encoded in XML/EAD. Based on the SDX platform, it is a very flexible Web application.”30
- XTF: “The CDL eXtensible Text Framework (XTF) is a flexible indexing and query tool that supports searching across collections of heterogeneous data and presents results in a highly configurable manner.”31 The California Digital Library uses XTF to enable search and display of its finding aids, text and image collections, and other scholarly projects.
- Apache Cocoon: Archives and consortia such as Five College Archives & Manuscript Collections32 are using the open source XML publishing framework Cocoon to publish finding aids.
- University of Chicago’s Mark Logic XML Database: The University of Chicago is developing an XML publishing infrastructure built on MarkLogic33 a native XML database. MarkLogic, which is a commercial product, was selected because it is robust, scalable, and easy to use. MarkLogic uses XQuery, which supports a feature called “collection.” Through the collection tag, different collections and archives can be defined, thus enabling the creation of a multi-institutional repository. Users can search the whole database or particular collections. The front end can be built on any platform and can be displayed in any way the archives want. The University of Chicago took this approach because their UNCAP project is multi-institutional and could be multiconsortial. Such an architecture will give participants the flexibility to create unique interfaces for different collections and projects. Chicago’s code will be available to anyone who asks. Archives that want to use the software will need MarkLogic, but there is a free version for a limited number of CPUs that will be sufficient for small institutions.
II. Archival Management Systems
Archival management systems may be less flexible than EAD creation tools, and getting legacy data into these systems can be challenging. However, they offer a number of features that may lead to greater efficiency and sustainability, such as support for authority control, reduced redundancy of data, easy data entry interfaces, the ability to analyze archival data through the generation of reports, and Web-publishing capabilities. Both open source and commercial archival management systems are available.
A. Open Source
- Archon (http://www.archon.org)
Developed by archivists at the University of Illinois at Urbana-Champaign, Archon makes it easy for archives to publish their finding aids online. As its developers explain, “Archon automates many technical tasks, such as producing an EAD instance or a MARC record. Staff members do not need to learn technical coding and can concentrate on accomplishing archival work. Little or no training is needed to use the system, assuming the staff member or student worker has at least a passing familiarly with basic principles of archival arrangement and description” (Prom et al. 2007, 165). Archon, which is built on PHP 5 and MySQL, enables archivists to capture information about accessions, create and publish finding aids on-line, and export EAD and MARC. A digital library module supports presenting digital objects along with finding aids. A winner of the 2008 Mellon Awards for Technology Collaboration (MATC), Archon is easy to customize and provides support for authority control. Explaining the appeal of Archon, one archivist noted, “Archon is free and pretty easy to implement without much IT intervention. … It gave us a quick and easy way to put collections up on online, let patrons search them, and see everything we had.” Others caution, however, that importing existing finding aids into Archon can be difficult, given the variability of EAD.
- Archivists’ Toolkit (AT) (http://www.archiviststoolkit.org/)
Developed by a consortium including the University of California, San Diego Libraries, the New York University Libraries, and the Five Colleges, Inc., Libraries and supported by The Andrew W. Mellon Foundation, AT bills itself as “the first open source archival data management system to provide broad, integrated support for the management of archives.” AT uses a Java desktop client and a database back-end (MySQL, MS SQL, or Oracle). Users report that AT makes it easier to produce finding aids and export EAD and MARC, generates useful reports, provides robust authority control, and offers good support for standards such as METS. Several archivists believe that AT will provide an integrated tool set for managing and describing archival information: “I like the promise of having a single database for collection management. You do the accession record, push a button, convert to a resource record, and export as EAD and MARC. It’s not quite there yet, but moving in that direction.” Another archivist noted that AT helps archives establish processing priorities by allowing them to mark and then find high-priority collections. In a presentation on AT, Georgia Tech Archives highlights several reasons for adopting it, including “developed by archivists,” “promotes efficiency and standardization,” “serves as master version of finding aid,” “improves description workflow,” and “decreases need for training in XML and encoding” (de Catanzaro, Thompson, and Woynowski 2007). However, archivists noted that it can be difficult to import existing finding aids and make AT accommodate existing workflows. AT does not yet provide Web-publishing capabilities.
- CollectiveAccess (http://www.CollectiveAccess.org)
The recent recipient of a Mellon Collaborative Technology Grant, CollectiveAccess allows museums and archives to manage their collections and provide rich online access to them. CollectiveAccess is a Web-based tool built on PHP and my SQL, so it is cross-platform. According to its developer, Seth Kaufman, its chief advantages are that it
—has a flexible data model that accommodates many types of collections and supports different data standards and controlled vocabularies;
—provides robust support for multimedia, including images, audio, video, and text; is capable of automatic conversion of audio files to MP3 and video files to flash format; can zoom and pan images; and enables time-based cataloging of media files; and
—has a Web-based structure that facilitates distributed cataloging and enables administrative users to enter metadata and search collections online.Designed more as a collection management than archival management system, CollectiveAccess does not yet provide support for exporting EAD or MARC, although that is promised for a future release. One user commented, “It’s so much easier than traditional collection management systems that I’ve worked with.”
- International Council on Archives-Access to Memory (ICA-AtoM) (http://www.ica-atom.org/)
ICA-AToM is open source, Web-based archival description software that aims to make it easy for archives to provide online access to their archival holdings, adhere to ICA standards, and support multiple collection types (even multirepository implementations) through flexible, customizable software. According to project lead Peter Van Garderen, the impetus behind ICA-AToM was to expose hidden collections around the world by enabling small archives with limited resources to make available their collections online. ICA-AToM is designed to support aggregation of data from multiple institutions through OAI, IETF Atom Publishing Protocol (APP), and possibly other mechanisms. Developers are working on a pilot project with the Archives Association of British Columbia to build an aggregated union list portal. ICA-AToM aims to distinguish itself through its support for translation and internationalization, basis in ICA standards such as ISAD-G and ISAD-H, flexibility and customizability, and ease of installation and use. As a fully Web-based application, ICA-AToM can be accessed from anywhere with an Internet connection and can be hosted at a minimal cost. In the long term, the developers want ICA-AToM to become a platform to manage archival information, including creating digital repository interfaces to systems such as DSpace and Fedora through a plug-in architecture. They plan to build in Web 2.0 features such as user-contributed content, user tagging, and social networking.ICA-AToM is currently in beta testing. Version 1.2, due to be released in summer 2009, will provide support for accessioning, OAI harvesting, crosswalking to standards such as DACS, EAD import and export, and many other features. Although ICA-AToM is designed more in accordance with ICA standards than U.S. standards, Van Garderen indicated that someone could easily add support for standards such as DACS and EAD and that version 1.2 will support EAD/MARC data import and export. For ICA-AToM, then, standards such as EAD and EAC will be exchange formats, while ISAD standards will be the core data format.ICA-AToM is new, and many of its features have yet to be released. For this reason, it is difficult to evaluate this software. However, members of the archival community are excited about its potential. An archivist who recently saw a presentation on ICA-AToM observed that the project has “impressive people on the team” and that the project lead is a trained archivist. Development seems to be proceeding quickly: within a month, the developers added the capability of attaching digital objects and are working speedily on making ICA-AToM RAD compliant. A developer noted that “smart people” are behind ICA-AToM, but it is currently focused on archival description, so it might be limited for institutions that want fuller support for collection management and presentation.
- Cuadra STAR/Archives (http://www.cuadra.com/products/archives.html)
Cuadra STAR/Archives offers a number of features for managing and describing archival collections, including creating accessions, tracking donors, creating finding aids, providing a Web interface to collections, and exporting EAD and MARC. Cuadra will host customers’ data and provide assistance in importing existing data into the system.
Calm for Archives, developed by DS, bills itself as “the leading archival solution in the UK.” It has a client/server architecture and requires Windows. Calm allows significant user customization and enables linking to digital objects. It supports EAD and General International Standard Archival Description [ISAD (G)], and is compliant with International Standard Archival Authority Record for Corporate Bodies, Persons, and Families [ISAAR (CPF)], and National Council on Archives (NCA) name authority guidelines. It offers OAI support (with the provision of an additional module) and rich searching options. There is a CalmView Web server module (based on .NET technology) for Internet or intranet access.
- MINISIS M2A(http://www.minisisinc.com/index.php?page=m2a)
MINISIS M2A was developed by MINISIS Inc. in collaboration with the Archives of Ontario in the 1990s. Since then, the precursor, ADD (archival descriptive database), has been enhanced to include more fields, more databases, more functionality, and more workflow and processing to become M2A as we know it today. M2A is flexible and customizable, and it supports standards such as EAD, ISAD(G), and RAD. Additional modules, such as client registration and space management, are available. MINISIS M2A is fully Web enabled and conforms to MARC, RAD, and EAD. MINISIS M2A can be expensive, but M2A Web, which is geared toward smaller archives, provides an inexpensive hosted solution for online creation and publishing of archival information.
- Adlib Archive 6.3.0 (http://www.adlibsoft.com/)
Developed by a company based in the Netherlands, Adlib Archive 6.3.0 offers support for international standards such as ISAD(G) and ISAAR(CPF). Adlib uses a Windows-based desktop client and a database backend. Web publishing of archival information is available through the purchase of the Adlib Internet Server, which is built on Microsoft technologies. Adlib Archive provides support for OAI.
- Past Perfect 4.0 (http://www.museumsoftware.com/pastperfect4.htm)
Past Perfect describes itself as “affordable, flexible and easy to use” collection management software. It provides support for a number of collection management tasks, such as accessions and deaccessions, loans and exhibits, fundraising, membership, and object-level cataloging. The application is PC based, but a Web-based catalog can be built with the purchase of the Past Perfect Online34 module, which can be hosted by Past Perfect or installed on a local server. Past Perfect does not currently provide support for EAD, but that is being considered for a future release.
- Eloquent Archive (http://www.eloquentsystems.com/products/archives.shtml)
Eloquent Archives describes itself as “an integrated application including all the functions for archival description, accessioning/de-accessioning, controlling vocabulary, custodial management, research requests, tracking, and other workflow management.” In addition to enabling archivists to manage and describe their collections, it provides support for tracking researchers and the usage of collections. Hosting for online access is available.
FOOTNOTES FOR SECTION 7
14 For more information about metadata description tools, see Smith-Yoshimura and Cellentani 2007.
15 See ArchivesHub’s Data Creation Web page for more on XML editors: http://www.archiveshub.ac.uk/arch/dc.shtml
19 See http://orbiscascade.org/index/northwest-digital-archives-tools
20 See http://www.ncecho.org/ncead/tools/tools_home.htm
21 See http://www.lib.virginia.edu/small/vhp/admin.html
22 See http://www.archivists.org/saagroups/ead/ead2002cookbook.html
23 See http://mextoolset.wiki.sourceforge.net/ and http://www.bundesarchiv.de/daofind/en/
8. Possible Approaches to Federating Archival Description from Multiple Repositories
Researchers face many challenges in identifying and gaining access to archival holdings distributed at archives and special collections across the United States. Many archives have not described all of their collections or made that information available online. Even if archival description is online, researchers have to look in several places to find relevant resources, searching MARC records in WorldCat, MARC and EAD records in ArchiveGrid, National Union Catalog of Manuscript Collections (NUCMC) records in Archives USA, EAD finding aids aggregated in regional repositories such as Online Archive of California and TARO, and/or finding aids provided through the Web sites of particular archives. In order to facilitate discovery of archival resources, the CLIR Hidden Collections Program aims to provide a federated catalog drawing from multiple repositories. As the 2008 program description states, “The records and descriptions obtained through this effort will be accessible through the Internet and the Web, enabling the federation of disparate, local cataloging entries with tools to aggregate this information by topic and theme.” Archivists whom I interviewed recognize the value of aggregating information from multiple repositories. As one interviewee noted, “We just have to federate—there really isn’t a reason to stop at the stage of putting things on the Web. The point of EAD was not to put finding aids online, but to share, to get everyone together, to do things across a collection. If we don’t make the step forward to sharing, we might as well be using HTML.”
However, federating archival descriptions poses some significant challenges. For one thing, an appropriate technical infrastructure needs to be developed, perhaps leveraging OAI-PMH or RDF (Resource Description Framework). A federated catalog needs to be flexible enough to accommodate the diverse data generated by archives, yet rigorous enough to present data in a standard format. Options for federating archival data include:
- Make MARC and EAD available through a national/international service such as ArchiveGrid, Archives USA, or Archives Hub.
OCLC’s ArchiveGrid35 includes archival information from thousands of archives in the United States, the United Kingdom, Germany, Australia, and other countries. Archive Grid draws from two main data streams: archival records in WorldCat (about 90 percent of the total records) and finding aids harvested from contributing institutions.36 These finding aids can be written in EAD, HTML, or plain text. To set up the harvesting, OCLC asks contributors to point to a Web site of finding aids that can be crawled. The crawler brings over the text of the finding aid, parses it so that it maps to the ArchiveGrid’s record structure, and adds it to the index. For harvested finding aids, ArchiveGrid links from its search results to the full finding aid on the contributor’s Web site, similar to a Google result. Thematic collections are not currently represented; ArchiveGrid does not yet have consistent topical categories to apply across its varied contributions, but that could change. Archives pay nothing to contribute records to ArchiveGrid, but access to the full records in Archive Grid is available only through a subscription. However, through OpenWorldCat, researchers can access a large subset of archives’ MARC records that are also available through ArchiveGrid. It is possible that an archival version of the freely available OpenWorldCat—Open ArchiveGrid?—could be developed so that a subscription would not be required. One archivist reported satisfaction with Archive Grid: “Archive Grid is harvesting our EAD files. … It seems to be gathering those OK.”Another aggregation model is provided by Archives Hub, the United Kingdom’s “national gateway to descriptions of archives in UK universities and colleges.”37 Supported by Mimas, “a JISC and ESRC [Economic and Social Research Council]-supported national data centre” for higher education,38 Archives Hub offers a distributed model for aggregating content from individual archives. Archives can become “spokes,” enabling them to retain control over their data and provide a custom search interface to their collections while also making their content available through a common interface (Archives Hub 2008). Archives Hub is built on the Cheshire full-text information retrieval system, which includes a Z39.50 server. Archives Hub focuses on higher education institutions in the United Kingdom, but will accept contributions from other relevant repositories. (Nevertheless, it is probably more appropriate as a model than as a repository for U.S. finding aids.)ProQuest’s Archives USA “is a current directory of over 5,500 repositories and more than 161,000 collections of primary source material across the United States.”39 It provides online access to the NUCMC from 1959 to the present, names and subject indexes from the National Inventory of Documentary Sources (NIDS) in the United States, and collection descriptions contributed by archives. Like ArchiveGrid, Archives USA allows repositories to contribute finding aids at no cost, but requires a subscription to access.
- Harvest EAD from distributed repositories through OAI-PMH, Atom, or another technology
Existing technologies such as OAI-PMH40 and Atom41 support harvesting and aggregating content from distributed repositories. The University of Illinois-Urbana Champaign (UIUC) has already developed preliminary OAI services and tools to harvest information from EAD and other sources.42 As UIUC found, converting EAD to OAI-PMH poses several challenges: mapping a single EAD file to multiple OAI records; the variability of EAD-encoding practices; the complex hierarchical structure of EAD finding aids; and contextualizing individual results within the overall hierarchy (Prom and Habing 2002). Illinois experimented with “a schema that produces many DC [Dublin Core] metadata records from a single EAD file,” producing a collection-level record that linked to the EAD finding aid as well as providing links to related parts of the collection (Cole et al. 2002). Archon is now experimenting with harvesting finding aids from a static directory via OAI-PMH, but nothing has been released yet. Other archival management systems, including CALM for Archives, MINISIS M2A, and Adlib Archive, already provide support for OAI. The FCLA is also exploring using the OAI-PMH protocol to harvest EAD from registered provider sites (Florida Center for Library Automation 2008). While Kathy Wisser was at the North Carolina Echo Project, she developed a proof-of-concept distributed repository using the Internet Archive’s Heretrix Web crawler and XTF as the indexer.
- Adopt an archival management system that supports federation.
ICA-AToM is being designed to support harvesting and syndication via OAI and IETF Atom Publishing Protocol. According to its Web site, “it can be set up as a multi-repository ‘union list’ accepting descriptions from any number of contributing institutions.” Perhaps software such as ICA-AToM could be adopted to provide a union list, although such a solution may not be flexible enough to accommodate the varied methods archives use to deliver archival information.
FOOTNOTES FOR SECTION 8
36 Author’s interview with Bruce Washburn, consulting software engineer for RLG Programs, July 1, 2008.
Hidden collections pose complex challenges to archives and special collections, but implementing appropriate software can help organizations work more efficiently and provide broader access to archival information. Adopting new software, however, will require that archives adjust their workflows and import existing data into the new system. This study identifies some of the key requirements for archival management software so that archivists can make informed selections. In choosing software, archives should determine which requirements are most important: Do they need to publish finding aids online? Do they need to import and export data in particular formats? Do they want support for key management functions, such as accessioning and generation of reports? Do they prefer commercial or open source software? In addition, they should carefully study factors such as cost, customer service, and core functionality. This report has aimed to outline the collective understanding of archival management software at this time and to provide a basis for expanding that knowledge.
Author’s note: I have bookmarked over 200 Web pages relevant to this study, including most of the resources below, at http://www.diigo.com/user/lspiro/archival_tool_study.
Archives Hub. 2008. Archives Hub: Creating and Managing Spokes. Available at http://www.archiveshub.ac.uk/arch/spokesnew.shtml.
Archivists’ Toolkit. 2008. Features Matrix: Archivists’ Toolkit, Archon, and PastPerfect. Available at http://www.archiviststoolkit.org/Comparison_of_Archival_Management_Software_3.pdf.
Archon. October 2008. Archon™: Facilitating Access to Special Collections Project Update. Available at www.archon.org/ArchonUpdateOct2008.pdf.
Association of Research Libraries Special Collections Task Force. 2006. Special Collections Task Force Final Status Report. Washington, D.C: Association of Research Libraries. Available at http://www.arl.org/rtl/speccoll/spcolltf/status0706.shtml.
Baron, Robert. 1991. Choosing Museum Collection Management Software: The Systems Analysis. Available at http://www.studiolo.org/MusComp/STATEMNT.htm.
Bentley Historical Library, University of Michigan. n. d. MS Word 2000 EAD Templates and Macros. Available at http://bentley.umich.edu/EAD/bhlfiles.php.
Canadian Heritage Information Network. 2003. Collections Management Software Review. Available at http://www.chin.gc.ca/English/Collections_Management/Software_Review/introduction.html.
Canadian Heritage Information Network. 2002. Collections Management Software Selection. (Last modified April 27, 2002.) Available at http://www.chin.gc.ca/English/Collections_Management/Software_Selection/index.html.
Cole, Timothy, Joanne Kaczmarek, Paul Marty, Chris Prom, Beth Sandore, and Sarah Shreeves. 2002. Now That We’ve Found the ‘Hidden Web’ What Can We Do With It? The Illinois Open Archives Initiative Metadata Harvesting Experience. Presented at the Museums and the Web 2002, Boston, Mass., April 18-20, 2002. Available at http://www.archimuse.com/mw2002/papers/cole/cole.html.
Collections Trust. 2008. Software Survey—SPECTRUM Partners’ Systems. Available at http://www.mda.org.uk/software.
Council on Library and Information Resources. 2008. Cataloging Hidden Special Collections and Archives: Building a New Research Environment. Washington, DC: Council on Library and Information Resources. Available at https://www.clir.org/activities/details/hiddencollections.html.
de Catanzaro, Christine, Jody Lloyd Thompson, and Kent Woynowski. 2007. Archivists’ Toolkit: Issues in Implementation. Presented at the GALILEO Users’ Group Meeting, Fort Valley, Georgia, May 17, 2007. Available at http://smartech.gatech.edu/handle/1853/14405.
Dewhurst, Basil. 2001. Planning and Implementing a Collection Management System. Health and Medicine Museums Newsletter 20 (July). Available at http://archive.amol.org.au/hmm/pdfs/hmm20.pdf.
Di Bella, Christine. 2007. Philadelphia Area Consortium of Special Collections Libraries (PACSCL) 30-month Consortial Survey Initiative. Society of American Archivists Manuscript Repositories Newsletter (Summer). Available at http://www.archivists.org/saagroups/mss/summer2007.asp#5.
Digital Publishing Group, UC Berkeley Library. n. d. EAD History. Available at http://www.lib.berkeley.edu/digicoll/bestpractices/ead_history.html.
Florida Center for Library Automation. May 28, 2008. Sustaining & Growing The Opening Archives In Florida Project: Report of Ad Hoc Project Advisory Group Meeting. Available at http://www.fcla.edu/dlini/OpeningArchives/advisoryGroupMeeting.pdf.
Greene, Mark, and Dennis Meissner. 2005. More Product, Less Process: Revamping Traditional Archival Processing. American Archivist 68(2): 208-263. Available at http://archivists.metapress.com/content/c741823776k65863.
Groot, Tamara, Peter Horsman, and Rob Mildren. November 2003. OSARIS: Functional Requirements for Archival Description and Retrieval Software. Paris: International Council on Archives. Available at http://www.archiefschool.nl/docs/Osaris%20Draft%20Requirements.pdf.
Jones, Barbara. Hidden Collections, Scholarly Barriers. 2003. Association of Research Libraries Task Force on Special Collections. Available at http://www.arl.org/bm~doc/hiddencollswhitepaperjun6.pdf.
Lake, David, Russell F. Loiselle, and Debra Steidel Wall. 2003. Market Survey of Commercially Available Off-the-Shelf Archival Management Software. International Council on Archives. Available at http://www.ica.org/en/node/30064.
Lakhan, Shaheen E., and Kavita Jhunjhunwala. 2008. Open Source Software in Education. EDUCAUSE Quarterly 31(2): 32-40. Available at http://connect.educause.edu/Library/EDUCAUSE+Quarterly/OpenSourceSoftwareinEduca/46592.
Library of Congress Working Group on the Future of Bibliographic Control. 2008. On the Record: Report of the Library of Congress Working Group on the Future of Bibliographic Control. Available at http://www.loc.gov/bibliographic-future/news/index.html.
Mandel, Carol. Hidden Collections: The Elephant in the Closet. Fall 2004. RBM: A Journal of Rare Books, Manuscripts, and Cultural Heritage 5(2): 106-113. Available at www.ala.org/ala/mgrps/divs/acrl/publications/rbm/backissuesvol5no2/mandel.pdf
Mugie, Hade. May 2008. Survey of Archives Management Software. ICA-AtoM Project/Dutch Archiefschool.
Office of Government Commerce. 2002. Open Source Software: Guidance on Implementing UK Government Policy. Available at http://www.ogc.gov.uk/documents/Open_Source_Software.pdf.
OSOR.EU. May 2008. EU: European Commission to increase its use of Open Source. Available at: http://www.osor.eu/news/eu-european-commission-to-increase-its-use-of-open.
Panitch, Judith M. 2001. Special Collections in ARL Libraries: Results of the 1998 Survey Sponsored by the ARL Research Collections Committee. Washington, D.C.: Association of Research Libraries. Available at http://www.arl.org/rtl/speccoll/spcollres/.
Perkes, Elizabeth. 2008. Creating Container Lists Using Excel and Word Merge Options. Available at http://archives.state.ut.us/containerlist/containerlist.html.
Prom, Christopher. 2007. Optimum Access? A Survey of Processing in College and University Archives. Draft of chapter that later appeared in Christopher J. Prom and Ellen D. Swain, eds., College and University Archives: Readings in Theory and Practice. Chicago: Society of American Archivists, 2008. Draft available at http://web.library.uiuc.edu/ahx/workpap/ChapterEight-Prom.pdf.
Prom, Christopher J., and Thomas G. Habing. 2002. Using the Open Archives Initiative protocols with EAD. In Proceedings of the 2nd Joint Conference on Digital Libraries, 171-180. New York: Association for Computing Machinery.
Prom, Christopher J., Christopher A. Rishel, Scott W. Schwartz, and Kyle J. Fox. 2007. A Unified Platform for Archival Description and Access. In Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, 157-166. Vancouver, BC, Canada: Association for Computing Machinery. Available at
Shreyer, Alice. 2007. University of Chicago Explores Library-Faculty Partnerships in Uncovering Hidden Collections. ARL: A Bimonthly Report 251 (April). Available at http://www.arl.org/resources/pubs/br/br251.shtml.
Smart, Christina. July 5, 2005. Choosing Open Source Solutions. JISC e-Learning Focus. Available at http://www.elearning.ac.uk/features/oss.
Smith-Yoshimura, Karen, and Diane Cellentani. November 27, 2007. RLG Programs Descriptive Metadata Practices Survey Results: Data Supplement. Dublin, Ohio, OCLC Programs and Research. Available at http://www.oclc.org/programs/publications/reports/2007-04.pdf.
Steele, Victoria. 2008. Exposing Hidden Collections: The UCLA Experience. C&RL News 69(6). Available at http://www.ala.org/ala/mgrps/divs/acrl/publications/crlnews/2008/jun/hiddencollections.cfm.
Stefko, Katherine. 2007. Can You Get AT without IT? Implementing the Toolkit at a Small College Repository. Presented at panel, “Where are We ‘AT’? A Status Report on the Archivists Toolkit.” SAA Annual Meeting 2007, Chicago, Ill., Aug. 28-Sept. 1, 2007. Available at http://smartech.gatech.edu/handle/1853/16509.
Stevens, Amanda. July 11, 2008. Midterm Report on Software Review and Recommendations Project. Council of Nova Scotia Archives.
Tabb, Winston. Fall 2004. Wherefore Are These Things Hid?: A Report of a Survey Undertaken by the ARL Special Collections Task Force. RBM: A Journal of Rare Books, Manuscripts, and Cultural Heritage 5(2): 123-126. Available at http://rbm.acrl.org/content/5/2/123.full.pdf+html.
TASI. 2007.TASI—Choosing a System for Managing Your Image Collection. Available at http://www.tasi.ac.uk/advice/delivering/choose-ims.html.
University of California, Berkeley. 2005. Berkeley Web Template CGI Script. Available at http://sunsite3.berkeley.edu/ead/tools/template/.
Utah State Archives. 2002. Encoded Archival Description Project. Available at http://historyresearch.utah.gov/inventories/ead.htm.
Ven, K., J. Verelst, and H. Mannaert. 2008. Should You Adopt Open Source Software? Software IEEE 25(3): 54-59.
Whitlock, Natalie. March 1, 2001. The Security Implications of Open Source Software. IBM Developer Works. Available at http://www.ibm.com/developerworks/linux/library/l-oss.html.
Wilson, James A. J. 2007 (updated 2 Sept. 2008). Benefits of Open Source Code. Text. Available at http://www.oss-watch.ac.uk/resources/whoneedssource.xml.
Wilson, James A. J. 2006. Open Source Maturity Model. Text. JISC OSS Watch. Available at http://www.oss-watch.ac.uk/resources/osmm.xml.
Wisser, Katherine M. 2005. EAD Tools Survey.
Woodson Research Center. February 1, 2008. Wishlist for Archival Management Systems. Fondren Library, Rice University.
Yale University Library. 2003. Report to the Digital Library Federation. Available at https://old.diglib.org/pubs/news04_01/yale.htm.