4.1. Resource Discovery
Many of the difficulties encountered by end users when searching the Web also confront library subject specialists and technology experts in their efforts to select free Web sites. Identifying high-quality Web resources is labor-intensive. Properly carried out, it is the most challenging and potentially most costly aspect of building scalable and sustainable collections. Although machine harvesting appears to be promising, it remains in nascent stages of development and is available only in limited settings. Resource discovery, evaluation, and indexing (i.e., cataloging) are still primarily manual processes that require well-formed strategies and efficiencies. The most clearly delineated resource discovery sources and strategies identified in preparing this report are those used by the Social Science Information Gateway. They include the following:
- joining discussion lists
- subscribing to distribution lists and e-mail publications
- monitoring and browsing sites
- actively searching the Internet
- subject catalogs
- higher education sources
- Internet search tools
- sites and lists that announce new Internet resources
- Web agents
- searching non-Internet sources (e.g., scholarly journals, newsletters, Web reviews)
Resource discovery strategies and procedures outlined in the DESIRE Information Gateways Handbook (section 2-2) are also recommended.
4.2. Added Value: Cataloging, Metadata, Search Functions
Resource selection is not the only acquisitions function of libraries. To assure access, a library must provide the following range of value-added services:
- content description (e.g., descriptive and subject cataloging)
- resource organization (e.g., classification schema; indexing services)
- collection maintenance (e.g., provision of access over time, preservation, archiving, deselection)
These same responsibilities pertain to free Web resources. There is not yet full agreement on whether traditional cataloging practices (MARC, LCSH, MESH) and classification schema (Library of Congress or UDC) adequately describe digital formats or satisfactorily serve users. Some experts recommend the creation of MARC records stored in traditional OPACs, while others call for new methods of description and record storage. Regardless of the descriptive rules and type of catalog or database selected, there is a consensus that the minimum identification and retrieval data are as follows:
- title/name of resource
- location of resource (URL)
- author or editor (i.e., creator(s) of resource and of its intellectual content)
- publisher (i.e., organization making the resource accessible)
- free-text description, including audience
Other elements recommended for inclusion in the catalog record are those developed by the Dublin Core Metadata Initiative. They include the following:
- date (created, last modified, data gathered)
- type (collection, database, guide/gateway, organization, service, home page, news service)
- relation (e.g., is part of, has part of, is a version of, replaces, is referenced by, is based on)
- coverage (geographic and temporal)
Unrelated to the specific retrieval data one should record and the format in which it should be recorded (e.g., MARC, Dublin Core), other cataloging issues need to be resolved. For example, if a resource points to other sites, one should determine whether each site requires its own unique record or whether a record for the “primary site” or collection is appropriate. Similarly, at what level is cataloging content adequate? What level of granularity should the cataloging record reflect? How these questions are answered will determine the quality of the collection. Other issues suggest that librarians may need to rethink traditional library cataloging practices, lest metadata cataloging backlogs equal or surpass the backlogs of uncataloged print resources stored in research libraries throughout the world.
Recommended Examples of Value-Added Services
Baruth, Barbara. 2000. Is Your Catalogue Big Enough to Handle the Web? American Libraries 31(August): 56-60.
DESIRE Information Gateways Handbook. 2.4 Cataloguing. Available at http://www.desire.org/handbook/2-4.html.
Dublin Core Metadata Initiative. Available at http://dublincore.org/.
Humbul Humanities Hub. Describing and Cataloging Resources, version 1.0 (modified 20 Feb., 2001). Available at http://www.humbul.ac.uk/about/catalogue.html.
MacCall, Steven L., Ana D. Cleveland, and Ian E. Gibson. 1999. Outline and Preliminary Evaluation of the Classical Digital Library Model. In Knowledge, Creation, Organization, and Use. Proceedings of the 62nd ASIS Annual Meeting, Washington, D.C., October 31-November 4, 1999: Medford, N.J.: Information Today.
RENARDUS. Executive Summary. Available at http://www.renardus.org/deliverables/d6_1/doc0002.htm.
ROADS Cataloguing Guidelines. Available at http://www.ukoln.ac.uk/metadata/roads/cataloguing/.
Sowards, Steven W. 1998. A Typology for Ready Reference Web Sites in Libraries. firstmonday Peer-Reviewed Journal of the Internet 3(5). (See Elements in Typology of Ready Reference Web Site Designs, pp. 5-6.) Available at http://www.firstmonday.org/issues/issue3_5/sowards/index.html.
University of Virginia Libraries. 1998. Ad Hoc Committee on Digital Access. Final Report. Approved June 15, 1998.
5. Data Management: Collection Maintenance, Management, and Preservation
Once created, collections of free Web resources require maintenance. Unlike print resources, free sites are highly dynamic. Content changes or is revised rapidly. In the context of higher education, superseded content can be crucial. In the print environment, superseded content can be easily retained or, if transferred to another location, relatively easily retrieved. The Web does not guarantee equivalent availability and access to superseded information. Unless archived and readily available through that archive, superseded information ceases to exist. Because of this ephemeral aspect of the Web, effective maintenance of collections of Web resources is labor-intensive and calls for a long-term staffing commitment. Collection maintenance can be among the more costly aspects in building and managing scalable and sustainable collections of Web resources. Archiving “snapshots” of Web resources as they exist at any given moment requires staff time and server space. Providing a mirror site for information developed and maintained by a third party is also costly, but not necessarily prohibitive, if the need to preserve or mirror is acknowledged to be within the scope of the collection. The National Library of Australia Pandora Project recognizes its obligation to provide indefinite access to the Web sites it selects for the project, and it archives these sites at the time they are cataloged. The largest and best-known archiving project is Brewster Kahle’s Internet Archive, which employs Web-crawling robot software to collect Web pages from publicly accessible Web servers and examines links on these pages to locate, evaluate, and archive yet additional pages.
Maintenance tasks include the following:
- Link checking. Among the most persistent problems associated with collections of free Web resources are dead links. Various software programs are available to monitor links; these programs can be programmed to run at predetermined intervals. Recommended intervals range from once a week to once every three months. Longer intervals have proved counterproductive.
- Reviewing error codes. Perhaps the most frequently encountered error code is “403 Page not found.” Resource locators frequently change, but the old URL may not point to the new location. Software will report these changes; staff members need to update all appropriate links.
- Reviewing content. Because content frequently changes, staff should regularly confirm that it remains consistent with descriptions found in cataloging records and that it continues to conform to the collection scope and policy.
- Revising cataloging records. Link checking and content review will determine whether and to what extent cataloging records should be revised.
- Deselection. Cataloging records and links should be deleted when a site can no longer be found or when its content no longer conforms to the collection scope and policy. Pointing to content that cannot be located discourages users from using the collection. Similarly, when the quality of a site has deteriorated or changed to the extent that it no longer meets users’ needs or scope criteria, the site should be deselected by authorized staff.
Recommended Examples of Maintenance Guidelines
DESIRE Information Gateways Handbook. 2.6. Collection Management. Available at http://www.desire.org/handbook/2-6.html.
Humbul Humanities Hub.4. Collection Management. Available at http://www.humbul.ac.uk/about/colldev4.html.
National Library of Australia. Pandora Project. Available at http://pandora.nla.gov.au/.
Nicholson, Dennis, and Alan Dawson. BUBL Information Service: 8.5 Link Checking and Record Maintenance. In Wells, Amy Tracy, et al. 1999. The Amazing Internet Challenge: How Leading Projects Use Library Skills to Organize the Web. Chicago: American Library Association.
The Internet knows no national or ideological boundaries. It permits users to access information, regardless of the country in which the server hosting the resource is located. Free Web resources are created and maintained in all the languages of the world. American research libraries collect foreign-language sites that support teaching and research in language and literature programs or certain subdisciplines in history, art, music, medieval studies, and political science. However, repeated informal surveys of free Web resources offered by leading American university libraries reveal that they neither reflect the breadth of non-English-language resources accessible via the Internet nor begin to approach the extent to which publications in languages other than English are represented in and continue to be acquired for their print collections. This is primarily because English is used so extensively in the Internet and because it is increasingly the language of choice for international communication. These two facts combine to encourage a regrettably large number of academics, including librarians and technical specialists, to underestimate the extent to which free Web resources with foreign-language content are needed to support higher education and research.
Previously, these sites presented major access problems for users, because software needed to display non-roman alphabets and character sets was not widely available. Inexpensive software programs have now largely overcome this problem, but other challenges to incorporating non-English sites remain.
The British Resource Discovery Network (RDN) has addressed the issue of collecting non-English sites and recommends that inclusion should be based on appropriateness to the larger topic, scalability, and user demand. The RDN recommends
- a predefined number of languages that are significant or appropriate for the subject: importance of languages other than English for a specific subject (e.g., Danish for sites pertaining to Kierkegaard or Italian for sites pertaining to opera)
- value to user
- scalability: strategic language-by-language expansion of a site
The ease of displaying non-roman alphabets and character sets notwithstanding, including foreign-language content presents a range of additional challenges, including the following:
- Data presentation. What software and standards are required to display, search, and retrieve foreign languages using non-roman fonts? Should non-roman fonts be romanized?
- Metadata and cataloging rules. Should titles and the names of corporate bodies be translated into English? Should only English-language descriptive cataloging and keywords be used? Descriptions in two or more languages may enhance access but significantly increase workload.
- Searching and browsing. Should one expect to search by language or by domain name to narrow search results?
Many legitimately call for greater overarching foreign-language search capabilities. Research and development projects are currently under way to provide greater access to Web content without linguistic barriers through systems using cross-language information retrieval. The goal of these systems is to create search capabilities that permit the retrieval of sites to be independent of the natural language used to state the query. The success of such systems will depend on the broad application of emerging Web standards. The myriad issues and challenges pertaining to multilinguality lie outside the scope of this paper and will not be addressed here. Readers interested in these matters are referred to the following:
Oard, Doug W. 2000. Cross-Language Information Retrieval Resources (Overview) [last modified Nov. 24]. Available at http://www.ee.umd.edu/medlab/mlir/.
Peters, Carol, and Costantino Thanos. 2000. DELOS: A Network of Excellence for Digital Libraries; Promoting and Sustaining Digital Library Research and Applications in Europe. Cultivate Interactive 1 (July). Available at http://www.cultivate-int.org/issue1/delos/
Koch, Traugott. 2000. Cross-Browsing in Renardus: Usage of Vocabularies in Renardus Gateways. Available at http://www.lub.lu.se/renardus/class.html.
Recommended Examples of Multilinguality Practices
Digital Asia Library (DAL). About DAL. Available at http://digitalasia.library.wisc.edu/about.html.
DESIRE Information Gateways Handbook. 2.12. Multilingual Issues. Available at http://www.desire.org/handbook/2-12.html.
Jennings, Simon. 2000. RDN Collections Development Framework, Version 1.1 (May). Available at http://www.rdn.ac.uk/publications/policy.html.
Without publicity and promotion, a collection of Web sites can be an underutilized, even an unused, resource. A formal plan to inform potential users is essential. Publicity is best accomplished when collection creators identify their user groups and develop publicity and training materials best suited for those users.
Publicity can range from print media to electronic media and may include face-to-face presentations. In the digital environment, it may seem inappropriate to rely on print formats to promote Web-based resources, but flyers, posters, newsletters, articles and reviews in professional journals, and press releases remain the primary modes of advertising commodities and services. Print-based publicity is highly effective when directed to specific user communities, but traditional print formats have certain costs associated with production and distribution (e.g., paper and printing costs, distribution and advertising fees). Using e-mail for publicity purposes avoids these expenses, but staff must still be paid to prepare publicity. A good example of effective electronic publicity is the regular updates the Internet Scout Report sends to its list subscribers.
Face-to-face presentations, workshops, conference papers, and poster sessions can be highly successful, but the costs associated with such presentations (staff time, support to prepare presentation materials, conference registration fees, travel and lodging) mount rapidly.
Recommended Examples of User Support
DESIRE Information Gateways Handbook. Publicity and Promotion 2.8. Available at http://www.desire.org/handbook/2-8.html.
Internet Scout Report. Available at http://scout.cs.wisc.edu/scout/report.
SOSIG. Social Science Information Gateway. Training Materials and Support. Available at http://www.sosig.ac.uk/.
UKOLN. The UK Office for Library and Information Networking. Publicising Your Project. Available at http://ukoln.bath.ac.uk/services/elib/info-projects/publicise.html.
Collections of Web resources can be built by one person working in relative isolation or by large collaboratives working together on-site or at various locations. Staff may volunteer their expertise and services or they may be paid. There is no preferred model; each presents a range of options, advantages, and disadvantages depending on the scope and goals of the collection. No staffing model, however, can be successful if it does not recognize that building and maintaining the collection generate specific costs and present wide-ranging issues for communication and workflow across organizational units. In the case of libraries, a subject specialist working alone may have an impact on the workflow and priorities of the cataloging, reference/instructional, or technology staff. These costs and workflow issues must be recognized and addressed. The following section outlines staff skills and experience, training, individuals versus collaboratives, and costs associated with staffing and managing collections.
8.1. Staff Skills and Experience
Staff responsible for cataloging free Web sites benefit from broad training and experience in print-format description and subject catalogs. The Internet Scout Project in January 2001 posted a vacancy for a cataloger with the following skills and experience:
- Master of Library Science degree or corresponding experience
- educational/professional experience in electronic and networked information storage
- [Web-based] searching and retrieval
- knowledge of
- USMARC format
- emerging standards, such as Dublin Core
- Library of Congress subject headings
Skilled human involvement in the selection process is one of the most consistently called-for components of all projects studied in preparing this report. Harvester software may meet to a limited degree specific predetermined criteria for selecting resources, but only experienced subject experts (i.e., bibliographers, content experts, scholars) possess the level of knowledge required to select high-quality resources. Nonetheless, free third-party Web resources exhibit sufficiently different traits and characteristics from print and analog resources in terms of origin, content, authorship, access, and storage (archiving) that even the most skilled and experienced bibliographer of print publications would have difficulty in applying time-tested principles and guidelines for evaluating print and analog formats to scholarly resources in digital format. Other guidelines are needed. Some might argue that the selector trained in traditional collection-development practices is not the most appropriate person to identify, evaluate, and select free third-party Web resources. Some might even argue that the traditional bibliographer or selector is unprepared for the task at hand and that selection of Web resources more appropriately belongs in the realm of reference librarians, library technology staff, or other subject experts (e.g., advanced graduate students or faculty members). Such experience or training assures that those selecting for the collection understand user needs and expectations, and that they can base selection on a knowledge of the relevance and value of resources to the target audience. Subject experts are superior to harvest software because they can evaluate content critically and in a manner that harvesters have yet to master. Subject specialists should also be prepared to provide end-user training. Staff responsible for developing the intellectual scope and quality of collections should have experience developing analog collections or formal academic training in pertinent subject areas or both.
8.1.3. Technical Support
Central to successful Web resource collections is staff with excellent technical skills, regardless of the size of the collection. The role of staff is fundamental to the organization, access, and ongoing maintenance of the collection. Typical responsibilities of technical staff include the following:
- technical understanding of networked environment
- programming and scripting skills
- infrastructure software evaluation, selection, and maintenance
- interface development
- archival storage
- mirror site support (where appropriate)
8.1.4. Project Manager
The number and level of staff depend on the scope of the project. Large projects benefit from managers who can provide broad oversight and coordination. Persons with project management responsibilities should possess both subject and technical knowledge.
Recommended Staffing Skills
DESIRE Information Gateways Handbook. 1.3. Staff and Skills Required Overview. Available at http://www.desire.org/handbook/1-3.html.
DESIRE Information Gateways Handbook. 2.1. Quality Selection. Available at http://www.desire.org/handbook/2-1.html.
Jennings, Simon. 2000. RDN Collections Development Framework. Version 1.1 (May). Available at http://www.rdn.ac.uk/publications/policy.html.
8.1.5. Advisory Boards
It is commonly accepted practice to appoint advisory boards to large projects and to those of extended duration. Board members should include subject specialists and technical experts. Their role is to shape the overall goals and objectives of the collection, to confirm that the project remains on course over time, and to address emerging issues.
Recommended Advisory Board Models
BIOME Special Advisory Group on Evaluation. Available at http://biome.ac.uk/sage/.
Edinburgh Engineering Virtual Library (EEVL). Annual Report to the eLib for the Period from 1st August 1995 to 31st July 1996. 1.2 Project Infrastructure. Available at http://www.eevl.ac.uk/document.html.
8.2. Staff Training
The nature of the Web and the characteristics of free Web resources challenge traditional collection-development and -management practices. This reality requires that staff receive training and supervision. The DESIRE Handbook recommends developing the following:
- exercises and examples for evaluating Web sites
- online tutorials
- staff manuals
- process to review sites selected by staff
- group e-mail lists to discuss and debate quality issues
- editorial meetings
Many quality sites are added to collections through user suggestions by means of “Contact us” or “Add new resource” buttons. Training users of such sites to become informed selectors is neither appropriate nor feasible; however, some means of quality control should be maintained. The Humbul Humanities Hub has a policy that requires contributions from users whose credentials and selection criteria are unknown or cannot be judged to be reviewed, evaluated, and cataloged by staff.
Recommended Staff Training Practices
DESIRE Information Gateways Handbook. 1.3. Staff and Skills Required Overview. Available at http://www.desire.org/handbook/1-3.html.
DESIRE Information Gateways Handbook. 2.1. Quality Selection: Training Staff. Available at http://www.desire.org/handbook/2-1.html.
Humbul Humanities Hub.3. Collection Management. Available at http://www.humbul.ac.uk/about/colldev3.html.
Jennings, Simon. 2000. RDN Collections Development Framework. Version 1.1 (May). Available at http://www.rdn.ac.uk/publications/policy.html.
8.3. Financial Issues
This report concerns building sustainable collections of free third-party Web resources-resources to which anyone can have full access without compensating the creator or host site. “End-user access without compensation” is the extent to which these resources are free. All value-added services that libraries provide to ensure improved access have significant costs. Value-added services for analog collections (e.g., selecting, describing, organizing, and storing) have specific costs associated with them. These costs are well-known to administrators and rather well documented in professional literature. Value-added services for free Web resources have similar costs; however, few people are aware of those costs, and comparative cost data, across collections or institutions, are not readily available.
A good source for cost data is grant proposal budgets. One three-year project with a staff of 5.5 full-time equivalent (FTEs) estimated that total personnel costs would be $714,633 over the life of the project. The principal investigator’s home institution agreed to cover the cost of equipment and software, which totaled $22,300. Excluding overhead, the total budget for this three-year project was $736,933. The principal investigator proposed developing a collection of “up to 10,000 sites,” possibly fewer. The average cost per site in this project would be $73.69 if the project met its goal of 10,000 sites; the cost would be higher if fewer were selected. One might question whether the costs associated with this project reflect the average cost associated with building similar subject gateways. At the very least, this case demonstrates that there are identifiable costs associated with building collections of free Web sites. Except for the absence of a purchase price, the nature of these costs closely resembles costs associated with acquiring, cataloging, and maintaining analog collections: staff costs for selection and description; technology costs for storage and retrieval.
8.3.2. Sustainability and Related Costs
Building sustainable programs of any kind requires moving from specific projects, which by their nature lie outside the scope of long-term institutional goals, to programs that are integral to the institution’s goals and mission. How one elects to build sustainable collections of free, third-party Web resources has a direct effect on human resources, organizational models, and budgets. Staff members responsible for selecting and cataloging analog materials have full-time jobs. Increasing their responsibilities to include developing collections of free Web resources calls into question preexisting priorities (i.e., developing analog collections and other responsibilities). Creating opportunities for selectors to select free sites directly impinges on processing workflows. Which is a higher priority: processing new books that are not free, or cataloging free Web sites? If processing units receive additional staff to handle increased workloads stemming from the need to catalog free sites, have subject specialists received commensurate time to select and evaluate them? Further, what plans and provisions have been made to allow the requisite technical support of selectors’ and catalogers’ efforts? These questions underscore that new priorities in one area have a direct impact on workloads and priorities in other units.
Selecting free sites, whether a small number for inclusion in the OPAC or an entire collection to be maintained as a subject gateway, requires planning. Planning, in turn, requires that library managers understand and acknowledge that Web site selection is a new library-delivered service, or range of services, with specific and unique needs and with intrinsic and far-reaching implications. Workflows associated with analog collections run fairly independently of one another. After selection, the order is given to the acquisitions staff, who place the order, receive the item, and process the invoice before forwarding the item to the cataloging unit. Catalogers forward the item to staff, who apply call numbers, and other staff place the item on the shelf. There is rarely a need for cross-functional communication in the analog environment.
This is not the case with free Web sites. Each decision in the selection, cataloging, storage, and retrieval-interface process impinges on the process as a whole. Selecting free sites is a new responsibility. Thus, if existing staff begin selecting free sites, who will take on the work they previously handled? How will catalogers handle new formats for describing these resources? Can existing cataloging staff assume this responsibility without additional training, and who will continue the cataloging of analog materials? Can technical staff effectively provide access to these new resources? Do they have and understand how to use Unicode-compliant software? Can they cope with the new challenges of working with records based on Dublin Core? These are substantive questions that libraries are struggling to answer. Understanding that there is no single right answer, and recognizing that every policy decision can have a direct and significant impact on work units that in the analog context coexisted without having an impact on each other’s procedures or priorities will facilitate scalability.
Beginning with staffing choices and continuing through selection policy, cataloging practices, and interface design choices, building collections of free Web sets presents staff and administrators with a series of related issues that early in the planning process reveal themselves only superficially. For this reason, these issues require greater evaluation and consideration by all participants in the collection-building process and in looking at the process in its totality. Building collections cannot succeed if the process is viewed as a series of steps that coexist but do not influence or impede each other. Building collections of free Web resources must be viewed as a continuum-as a series of interdependent steps. Each component part has potential and probable influence and impact on one or more of the other parts. The scope of the collection influences selection, which, in turn, influences cataloging decisions. Technical limitations may determine the collection scope, cataloging practices, or other aspects of the collection. Understanding the range of issues and alternatives the collection will require and how they will affect each other will encourage the creation of multifunctional or cross-functional units that facilitate communication among those who must learn new skills (e.g., metadata formats) in order to provide new services (e.g., subject gateways). Undoubtedly, the staffing models created for project-based development of free Web sites will influence, if not determine, staffing needs and patterns for developing and maintaining sustainable collections of free Web resources.
8.3.3. Staffing Models: The Individual versus the Collaboratory
220.127.116.11. Individual Initiatives. Many outstanding collections have been built through the efforts of one person alone. Subject experts and collection curators are well positioned to identify resources relevant to their respective fields of expertise. The inefficiencies of this approach are numerous, but not necessarily so great as to rule out this approach in all contexts. Individuals can make important contributions if their collections are narrowly focused or specialized.
18.104.22.168. Departmental Initiatives. A library, a unit within a library, or a unit outside a library (e.g., an academic department) can quite effectively build collections. Subject pages and guides permeate home pages for libraries and academic departments. Even a cursory review of these sites will reveal a high degree of redundancy-75 percent or higher. This duplication of effort may not be “bad” or “wrong,” but it should call for close consideration and evaluation. Departmental initiatives serve many purposes and may be highly successful within their specific context. Their greatest weakness may be that they reflect the traditional “pride of place” and institutional reputation that have driven the building of print and analog collections-a reality created and encouraged by the nature of physical collections. Web-based resources do not have the fiscal barriers to access that characterize print-based materials. Why then build redundant collections that are unique only in their brand or URL?
22.214.171.124. Managed Collaboration. A review of stated and implied practices used in facilitating access to free Web sites suggests that the growth of these sites is too great to permit a single individual or institution to adequately identify and build collections in a timely manner. The Web is simply too vast. OCLC staff in 1999 described the number of Web sites as doubling annually; at the same time, half of all Web sites disappear each month. In other words, approximately 55 percent of all Web sites available on any given day did not exist one month earlier. Such statistics demonstrate the volatile and dynamic nature of the Web. If this growth and volatility continue, librarians will be well advised to emulate the collaborative Web harvesting projects of their colleagues throughout Western Europe and in Australia and New Zealand, where projects such as Resource Discovery Network (RDN), Social Science Information Gateway (SOSIG), Humbul Humanities Group, Finnish Virtual Library (FVL), EULER, and Pandora have advanced rapidly. Because these projects rely on collaboration among staff at multiple institutions and/or among special project staff, they have accomplished what no individual or single institution working in isolation can achieve: rapid and efficient collection development of nonredundant collections at a reasonable cost. In North America, the Internet Scout Report and the Digital Asia Library are two examples of specially funded projects staffed with full-time teams of subject specialists, technical experts, and metadata catalogers. These projects further illustrate that successful harvesting of high-quality Web sites is neither a part-time job nor an added responsibility for staff who are primarily accountable for other duties. In addition, discussions under way within ARL and various consortia underscore that successful mining of Internet resources will require libraries to provide users with vertical (i.e., deep) searching of Web content, not merely the horizontal (i.e., superficial) searching of sites typically provided by popular Web browsers (Campbell 2000).
126.96.36.199. Facilitated Collaboration. Facilitated collaboration is not based so much on shared principles, values, or aims as on the use of some high-level common framework for software such as DBOZ.org or the Cooperative Online Resource Catalog (CORC). The latter, organized by Netscape and others, is a collection of site reviews to which users may contribute. CORC is a metadata creation system for bibliographic records and pathfinders that describes electronic resources and has contributors from around the world. Both cases afford major benefits: large numbers of individuals coordinated by their home institutions contributing large numbers of sites, resulting in a rapid rate of collection development. Shortcomings include the lack of a single, overarching set of selection criteria, limited assurance that resource descriptions reflect current content, and uneven subject coverage.
The highest level of collaboration is one in which participants recognize that decisions about metadata and controlled vocabularies need to be made, and that these decisions influence and determine collection scope, access, purpose (i.e., popular or scholarly), human resources, and cost. Who makes decisions and how decisions are implemented is fundamental to all forms and levels of collaboration.
Why collect free Web resources? The obvious answer is that current users need facilitated, value-added access to these resources to ensure that they will retrieve sites with high-quality content. The primary question for the future is whether broad application of enhanced metadata standards and next-generation search engines will allow end users to mine the Web themselves with greater precision than is currently possible and, in so doing, bypass the current need for facilitated access. In other words, will there be ongoing need for subject specialists (content experts) to provide the services traditionally provided by bibliographers and libraries?
For the foreseeable future, it is safe to say that the higher education community will remain dependent on collections of high-quality resources selected and described by experts using the practices outlined in this report. Near-term prognostications do not call for the subject expertise of humans to be replaced by computer-based search capabilities. Instead, the higher education community will grow increasingly dependent on free Web content made available through expanded human efforts to winnow, sift, and deliver access to a larger percentage of the Web’s high-quality resources. Among the near-term future developments will be the following:
- increased outreach to user groups
- increased reliance on collaborative collection development
- greater emphasis on underrepresented subjects and non-text-based formats
- development of instructional support through course-specific collections or browsing by course number
- in-depth mining of distributed databases as foreseen in the Association of Research Libraries’ scholars portal model
- simultaneous searching of analog and Web-based resources through the integration of distributed catalogs of Internet resources and library OPACs
- increased acceptance of internationally recognized cataloging standards
- increased control of URLs and descriptive metadata to reduce or eliminate broken links
- broad use of harvesting software to collect embedded metadata and thus facilitate the rapid cataloging of sites and eliminate redundant efforts
- decrease in manual harvesting and cataloging
Until now, building collections of free Web resources has been modeled on time-honored practices for building print and analog collections: an informed winnowing-and-sifting process that entails application of predetermined criteria and the exercise of human judgment. The Web is far too vast, its resources far too rich, for these same practices to prove successful over time. The size of the Web already exceeds human ability to review, organize, and manage collections at the level required to sustain the needs and priorities of higher education. The dynamic nature of the Web, particularly of its free resources, will render unviable manual review and cataloging. Research libraries are approaching an environment in which selection and cataloging of free Web resources will be machine-driven. Humans will develop selection criteria, but machines will apply them and accept or reject resources at a speed that only computers can deliver. Descriptive and subject analysis will be drawn from metadata embedded within the resources themselves by their creators.
Successful machine harvesting and cataloging techniques have yet to be perfected. The automatic methods that are currently under development, however, appear promising. The Library of Congress’s Minerva project and the Swedish Royal Library Kulturarw3 project are examples of how sustainability of free Web resource collections will be achieved. How rapidly the process will be automated remains unclear. How easily automated procedures will be widely employed remains uncertain. Until technology can facilitate the harvesting and cataloging processes, manual practices will continue to be used and will be the foundation upon which a successful automated process is built.
Discussions of these topics and examples of current research projects in these areas are available at the following works and sites:
Arms, William Y. 2001. A Report to the Library of Congress: Web Preservation Project, Interim Report. Cornell University. Available at http://www.cs.cornell.edu./wya/LC-web/.
Campbell, Jerry. 2000. The Case for Creating a Scholars Portal to the Web: A White Paper. ARL Newsletter 211. Available at http://www.arl.org/newsltr/211/portal.html.
Dublin Core. Available at http://dublincore.org/.
Platform for Internet Content Selection (PICS). Available at http://www.w3c.org/PICS.
Resource Description Framework (RDF). Available at http://www.w3org/TR/REC-rdf-syntax.
Resource Organization And Discovery in Subject-based services (ROADS). ROADS Harvester software development. Available at http://www.ukoln.ac.uk/metadata/software-tools/.
Royal Library of Sweden. Kulturarw3 Heritage Project. Available at http://kulturarw3.kb.se/html/kulturarw3.eng.html.