Sections 1-3 • CLIR

1. Introduction

The purpose of this report is to identify and synthesize existing practices used in developing collections of free third-party Internet resources that support higher education and research. A review of these practices and the projects they support confirms that developing collections of free Web resources is a process that requires its own set of practices, policies, and organizational models. Where possible, the report recommends those practices, policies, and models that have proved to be particularly effective in terms of sustainability, scalability, cost-effectiveness, and applicability to their stated purpose.

Background work for this report confirmed John Kirriemuir’s observation that the selection of free third-party Web-based resources in North American academic libraries remains largely the responsibility of individuals working alone (Kirriemuir 1999). This background work also revealed that tasks ranging from identification of resources to their delivery through a library OPAC, Web page, or portal are seldom documented in a manner that would permit others to build upon existing practices, regardless of whether those practices demonstrate poor choices or good practices. This fact obtains whether one is looking for criteria used by an individual librarian or adapted by a library as institutional policy. For example, when interviewed about his Web selection criteria, one area studies librarian responded that he selected only “quality sites.” When asked to define “quality,” he replied, “We all know how to select good books, and Web resources are no different.” Third-party public domain Web resources, however, are fundamentally different from scholarly print and analog formats as well as from commercially produced digital resources.

This report outlines the similarities and differences between print and free Web resources and describes how the nature and complexity of free Web resources comply with or challenge traditional library practices and services pertaining to analog collections. The report also recommends, where appropriate, certain practices that have proved effective in meeting desired goals in specific contexts. There is no single set of preferred or “best” practices that one should follow in collecting free Web resources. What succeeds in one context might prove fruitless in another. The practices recommended in this report have proved effective in a specific project or show promise for broad applicability, regardless of the fiscal, technical, or human resource parameters in any specific institutional setting. Ultimately, it is not the practice per se, but rather how well that practice lends itself to a particular set of goals, workflows, and staffing preferences, that determines its effectiveness and value in any given setting.

1.1 Methodology

Because of the paucity of formal documentation, writing this paper required the use of several methods of data gathering.

Interviews. Librarians were asked to identify sites they considered to be “high quality” and to describe their criteria for evaluating the quality of a site. Every attempt was made to avoid implying or providing a definition of “high quality” when eliciting the librarians’ opinions. This methodology proved useful for several reasons. First, it demonstrated the degree of subjectivity in defining this term. It made it possible to identify a large number of sites in the humanities and social sciences, as well as in science, technology, and medicine (STM), that document their developers’ selection criteria. Interviewees who are directly involved in “collecting” free Web resources were asked to describe their experiences hiring and training staff, establishing and applying selection and cataloging criteria, identifying resources, and developing end-user interfaces.
Web Browsers. Using basic and advanced search strategies, the phrase “selection criteria” was used to locate Web sites where these terms were used and where criteria for selecting Web-based resources were discussed. This methodology was the least valuable of those used in researching this report, but it did confirm that little information on this topic is easily retrievable through browsers. Web browsers also permitted the identification of a number of highly developed subject gateways, the policies and practices of which proved very relevant to this report.
Gateways. More than 50 subject gateways, ranging from small to large, were reviewed in an effort to locate and evaluate their selection criteria, staffing patterns, and cataloging and related practices. Reviewing these sites and communicating with their staff was the most useful of all the forms of data gathering used for this report, and it yielded by far the most valuable findings. Creators of European and Australian sites have refined and articulated selection and related criteria to a far greater degree than have their North American colleagues. The selection criteria and documented practices of gateways such as the Internet Scout Report, however, approach or equal the high standards set by gateway creators in the Nordic countries, the Netherlands, Germany, the United Kingdom, and Australia.
English Language. Because this report was written for professionals involved in developing digital collections in North American libraries, only sites that provide documentation in English were evaluated. Owing to the extent to which the English language dominates the Web, limiting the report to English-language documents did not restrict the scope of the investigations upon which it is based. On the contrary, many of the most promising practices discussed in this report are for gateway projects in non-English-speaking countries and in which the documentation and user interface are provided in English and at least one other language.
1.2 Content

This report focuses on issues pertaining to the development of sustainable collections of free Web resources. The findings that emerged from the research done for this report document that developing and managing collections of free Web resources have wide-ranging, long-term implications for human resources, organizational issues, and fiscal matters that extend well beyond the circle of individuals responsible for selecting these resources. What one selects and how one wishes to deliver it to the end user frequently have significant implications for workloads and priorities in traditional cataloging and metadata departments, technical support units, and user-services programs. Free Web resources are challenging how libraries have traditionally processed and added value to print and analog publications. Many of the practices and policies developed by the projects reviewed for this paper shed light on how to address and manage the myriad access, staff development, user training, and cost issues associated with collecting free Web resources. Thus, in addition to the specific collection-development policy issues of collection scope, selection criteria, and resource discovery, a discussion of the broader issues pertaining to managing collections of free Web resources is included. Among these issues are the following:

Added Value or Cataloging. What levels of description, ranging from free-text notes to MARC records with traditional subject headings and metadata, are necessary or adequate to provide efficient access to free Web resources?
Access and Archiving. Options for integrating free Web resources into collections and services must be explored. For example, do OPACs provide sufficient and appropriate access? Are separate files required, or are new Web pages with a series of links adequate to ensure users will find these resources once they are selected? Once the resources are identified, how can extended availability be assured, given the ephemeral nature of Web-based content?
User Support. How do users learn of free Web resources after they have been selected? What are the end users’ support needs, and what are the implications of these needs for staffing and services?
Human Resources. What expertise is needed to select wisely and efficiently? The qualifications for staff who select Web resources are not as clearly defined as they are for staff who select print-based and analog resources for research library collections. How does one incorporate selection of free Web sites, a new job responsibility, into existing workloads? What skills and expertise best qualify an individual to be involved in the selection, cataloging, and technical support processes?
Workflow and Organizational Issues. Making free Web resources available to users directly has an effect on multiple work units and workflows. Decisions pertaining to Web resource selection, processing, and delivery have specific implications for preexisting workflows and priorities. In building sustainable collections of free Web resources, planners must recognize that there are connections between decisions that relate to Web resources and decisions that relate to analog collections.
Fiscal Implications. Free Web resources are not unlike gift collections: their acquisition has direct budgetary and workload implications. Identifying, selecting, and making available free Web resources incur costs to the library. These costs are not trivial, and they can ultimately influence both the selection and access delivery processes. Identifying the implications of free Web content and at what cost it will be managed and made accessible to end users influences planning, outcomes, and costs throughout the library.
1.3 Terminology

The research performed for this report revealed that not all individuals recognized for their Web resource expertise use technical terms or jargon uniformly. In some cases, variant terms were used synonymously; in others, experts used the same terminology quite differently, or at least with a variant nuance. To prevent ambiguity and to ensure consistent usage, it was deemed appropriate to define several terms. In reviewing these definitions, readers should bear in mind that much of the literature in which these terms are used most consistently has been prepared in the European Union, where British usage is more common. The definitions presented here, however, do not vary significantly from North American usage.

Free. Describes Web-based resources for which no compensation is required by the creator or host site in order to have full access to the content.

Collection. Two or more related items, the relationship of which is determined by the scope of the collection, as defined by the staff responsible for building that collection.

Gateway (i.e., subject or information gateway and synonymous with subject guide, subject page, megasite, virtual library, clearinghouse, and, increasingly, portal). The Internet Detective defines a “subject gateway” as differing from a search engine in that the content of a gateway has been selected through some form of human input, normally a critical evaluation by an information professional or subject expert. IMesh defines a subject gateway as “a web site that provides searchable and browsable access to online resources focused around a specific subject. Subject gateway resource descriptions are usually created manually. For this reason, they are usually superior to those available from a conventional web search engine” (IMesh Toolkit 2000).

Andy Powell provides a detailed set of definitions for the Resource Development Network (Powell 2000).

2. Why Select Free Third-Party Web Sites?

Today, researchers in all disciplines are creating scholarly Web-based resources, and students and faculty regularly browse the Internet in search of sites related to their interests. Some create profiles describing their needs so that search engines or harvesters relying on user-friendly language-retrieval systems can locate sites matching a given profile and notify the user. What is the role for librarians, other subject experts, and information specialists in developing relatively labor-intensive and costly handcrafted collections (i.e., subject guides, catalogs, gateways) of otherwise freely accessible Web sites?

Many free Web-based resources are merely collections of links to other sites or combinations of links and scanned files of texts, images, or sound and video. A growing number contain information not readily available elsewhere. Once found, many of these sites are easily viewed and downloaded. More sophisticated sites may require specialized or domain-specific tools, technologies, or guidance to view or use certain types of content (e.g., geospatial, scientific, or statistical information). These sites are difficult or impossible to find, even by scholars and information specialists. Search results can be so overwhelming that the user cannot be expected to evaluate them in a reasonable length of time. Moreover, organization within the Web presents most scholarly users with a technological labyrinth. Results may be far removed from, or totally unrelated to, the desired findings. Finally, the artificial intelligence technologies employed by the major Web discovery tools are insufficient to retrieve and adequately evaluate scholarly content.

Many of the inadequacies of the Web are due to the fact that the Web is still best described as “quantity without quality.” A more accurate statement would be that the Web reflects the full spectrum of quality, ranging from minimal or null to highly authoritative. Searching the Web on any topic will retrieve all information pertinent to one’s query, but there will be no qualitative evaluation or filtering of the content. Even the relevance rankings offered by some browsers do not reveal the quality of the information retrieved; they reveal only the frequency of terms stated in the query. Thus, a recent google.com search for “Mozart” retrieved more than 800,000 sites. They offered information not only about the composer and his works but also about composers influenced by Mozart, opera houses and companies, music processors, programming software, greeting cards, museums, the Salzburg airport, Austrian bonbons, and even personal pages in which the name Mozart appeared. Refining the search to “Wolfgang Amadeus Mozart” reduced the number of hits by a factor of 10, to a still-unwieldy 78,000+ sites. “Mozart biography” retrieved 322, and then “about 392” sites; “biography of Mozart” yielded “about 437” sites.

In cases such as these, relevancy ranking can significantly facilitate resource discovery. Search engines that display results according to relevance do so on the basis of the following factors:

search term frequency in the document
inverse document frequency of the search term (i.e., the lower the frequency of the term in the entire database, the higher the relevance)
length (density) of the document
date of the document
occurrence of the term in the document title or only within the document
proximity of two or more terms (the greater the distance between terms, the lower the relevance)
link popularity (i.e., the frequency with which other Web resources link to the document); similar to citation frequency measures used to evaluate the importance of a print resource or its author
Unfortunately, relevancy rankings do not ensure precise retrieval. They do not provide the higher education community with an evaluation of the intellectual level of content, nor do they provide the levels of organization and selectivity students and faculty require to take full advantage of free Web-based information. Only when sites have been reviewed, evaluated, selected, and cataloged will users be spared the ambiguities resulting from the randomness and “quantity without quality” of Web search results. Until technological advances permit rapid and highly precise retrieval of only that free Web-based content that is both authoritative and useful in the context of higher education and research, expert human intervention will be needed.

And therein lies the role of librarians and other subject specialists. The value-added services that libraries have traditionally provided for print formats need to be applied to free Web-based resources as well. The selection and cataloging functions of a physical library assure users that titles found there have met predetermined quality criteria and have been described in a manner that facilitates their identification and retrieval. Moreover, the cataloging process provides the authoritative and consistent grouping of related materials so vital to browsing and to the winnowing-and-sifting process that characterizes learning and research in an academic setting. As Sarah Thomas states, “Libraries can add value by promoting filtering and ranking which would prefer [Web-based] resources . . . that meet a set of established criteria, such as having a strong likelihood of authenticity, accuracy, or endorsement by others of standing” (Thomas 2000).

Another traditional library service that is absent in the Web environment is storage or archiving. Once print resources have been selected and integrated into a research collection, most of them will be retained indefinitely, or at least until their physical condition requires withdrawal or their content has been superseded to such a degree as to warrant replacement. This is not the case with Web-based information. Web content is not durable; there are no archival guarantees. Information available at any given moment can move or cease to exist without warning. Were libraries to select and archive Web resources in much the same way as they have collected print formats, users could be assured indefinite (i.e., archival) retrieval and delivery of what is otherwise an ephemeral format. How best and at what cost free sites should be archived and by whom are questions currently under discussion. (See Section 5: Data Management: Collection Maintenance, Management, and Preservation for further discussion of issues pertaining to archiving Web-based information.)

Librarians and other subject specialists have myriad opportunities to select and organize Web-based information. The current realities of the Web validate the need for human intervention in facilitating access to Web-based resources within the higher education context. By selecting a subset of resources that meet predetermined criteria and by facilitating access to them, librarians impose a quality structure on those resources. Users can be assured that sites found in such collections-selected, evaluated, and described by persons with recognized subject expertise, and made available through a catalog that provides a single interface to an integrated collection-are of related quality. Users can also be assured that access to these sites will be stable, and that resource discovery will be tailored to the characteristics of the collection and its content, rather than to the features of a specific search engine.

For a description of the role of librarians and subject experts as performed in an applied setting, see Lagoze and Fielding 1998.

3. Identification, Evaluation, and Selection

3.1 The Need for Web-Specific Selection Criteria

During the second half of the twentieth century, the library literature devoted much space to collection development and management theory and practice as they pertain to print and analog formats. Library school curricula and professional development workshops gave considerable attention to these topics as well as to selection practices and polices. Learning the component parts of collection policy statements and drafting sample statements were basic requirements in library school programs, and for many years they provided the focus for professional development and continuing education forums. In addition, library administrators have traditionally expected selectors to apply these theories and principles in formal policy statements. Although written policy statements frequently lie dormant for extended periods, they steer day-to-day selection decisions and determine the nature and quality of collections over time. Formally articulated collection-management policy statements are crucial to collection building in that they document the intended scope of a collection (i.e., what it excludes as well as what it includes) and explain historical strengths and weaknesses within a defined context.

This long-established practice of documenting collection policies for print and analog formats notwithstanding, there was at the close of the 1990s a paucity of documentation describing the practices followed in North American academic libraries when selecting free Web resources. Many subject specialists developing guides to free Web resources do not recognize the need to articulate their collection policies, as even a cursory survey of the Web will reveal. Online policy statements are rare; written documentation describing the goals and objectives in creating and maintaining these guides are rarer still.

This absence of formal, well-articulated, and commonly accepted criteria or guidelines for selecting free Web resources is due in part to the way in which such resources have thus far been acquired and added to academic library collections. By and large, administrators and subject specialists alike have perceived the selection of digital resources to be an extension of the print selection process. Staff charged with selecting print resources are the primary selectors for all or most digital formats, whether or not those resources are free. Blurring distinctions further is that information resources produced by and for academic and research communities have many similarities, whether these resources are print-based, analog, or digital. Scholarly resources can be textual, image-based, numeric, or geospatial. Their creators can be personal, corporate, governmental, nongovernmental, or even anonymous. Their content can be fact or fiction, original works of art or music, interpretive performances, or critical studies. Despite these differences, their single intended audience remains the academic community. Given this scenario, the need to articulate new or additional selection criteria for free Web-based resources is not immediately apparent. Indeed, many of the evaluative criteria that apply to print-based publications also pertain, to varying degrees, in the Web environment. The most obvious of these criteria are content quality and programmatic need.

As appropriate as these and other traditional selection criteria for print and analog formats may be, they quickly prove insufficient when selecting Web-based resources. Many print-based criteria lend themselves only partially to the evaluation of Web-delivered content; other criteria are not applicable at all. Web resources exhibit qualities that do not exist in print and analog formats and for this reason require additional evaluative measures. To evaluate their content, format, and dependence on technology and to describe their value and applicability to higher education and research, new criteria are needed.

Free third-party Web resources are particularly challenging. Their availability does not bear the imprimatur or nihil obstat that the traditional scholarly vetting process confers on the printed word. Frequently, the authority of those who have created these resources is not immediately discernible, and the veracity of their content cannot be surmised from the names and reputations of scholarly publishing houses. The concept and role of publisher in the case of free resources are oftentimes vague. Is the publisher the site hosting the file where the content is found? Or is it the person or body that developed the site? Further, most of the traditional services associated with marketing and distributing print resources are lacking. Vendors do not facilitate the acquisition of Web resources through any type of notification service, blanket order, or approval plan. National bibliographies do not record their existence. Rarely if ever do unsolicited mailings or the traditional print or electronic blurb announce their “publication.” Only a small number, far fewer than one percent by anyone’s measure, receive a formal review. Simply stated, the basic selection guidelines and principles of how to identify, evaluate, and acquire print-based and analog library materials-principles that are articulated so clearly in the collection-development canons-do not pertain in the environment of the Web. The successful inclusion of Web resources in academic library collections requires new or, at a minimum, additional selection criteria.

Fortunately, the number of clearly articulated, high-quality selection criteria for free Web resources is growing. The American Library Association (ALA) has promulgated basic guidelines for selecting Web sites suitable for public libraries, reference services, and the fields of education and business (2000, 1999, 1997). Some individuals have published well-written articles on general principles pertaining to selection of free Web resources (McGeachin 1998, Fedunok 2000, Sweetland 2000). A valuable collection of articles on major projects to harvest and organize Web-based content is The Amazing Internet Challenge: How Leading Projects Use Library Skills to Organize the Web (Wells, Calcari, and Koplow 1999). The extent to which staff have applied or adapted these guidelines in other projects remains largely undocumented.

Nevertheless, Web selection criteria are only rarely found in professional journals and other print formats. Instead, they are customarily found as Web-based documents, imbedded in scope statements and collection policies of the increasing number of subject gateways and in the background papers on resource discovery systems. For example, well-delineated and highly articulated criteria can be found at Web sites developed by the Internet Scout Report, the Research Discovery Network in Great Britain, sites in the Netherlands (e.g., DutchESS), and at various Scandinavian sites, such as those developed at the Kuopio University Library in Finland, all of which are available in English. The Internet Detective and the DESIRE Handbook are also valuable.

What follows is both a synthesis and an amalgamation of criteria and practices for evaluating free Web sites described at these and related sites in North America, Europe, and Australia. These criteria cover the full range of issues associated with evaluating free sites: provenance, authorship, content, design, user support, standards and technologies, navigation, and system integrity. The criteria have been selected for inclusion in this paper because of their universality. They apply equally to scientific, technical, medical, social science, and humanities Internet resources. Their application in existing projects has led to the creation of consistent and coherent collections of “high-quality” content. The DESIRE Information Gateways Handbook uses service-driven definitions to describe “high quality.” Thus, “quality” depends on users’ needs, and a “high-quality” site is one that meets their needs because it is relevant for their purposes. Further, quality is determined on the basis of inherent features of a resource. Thus, high quality can be ascertained only if a skilled and knowledgeable human evaluates both user needs and the inherent features of the Web resource. Building on this premise, Kuopio University Library staff expressly state that “Information content should comply with the needs of the frame organization” (Kuopio University Library Group 1996). That is to say, the resources selected should support the teaching, research, and information services of the library selecting them for their collection.

3.2 Developing Selection Policies

The need to define collection parameters and to articulate content criteria is equally important in the print and digital contexts. Development and maintenance of consistent and coherent collections of high-quality print or digital resources can succeed only in an environment in which value judgments are made on the basis of previously defined and agreed-upon collection policies and selection criteria. Clearly articulated policies allow staff, over time, to select content that is consistent with their institution’s mission and long-term goals. Formal policy statements afford users an opportunity to understand and evaluate the rationale underlying a collection and to determine its ultimate value in relation to their needs and expectations. How users conceptualize the nature and features of a collection significantly influences how they perceive that collection will serve their needs.

The DESIRE Information Gateways Handbook notes the following advantages of developing formal policies specific to free Web resources and making them available online to both staff and users. Such policies:

help users appreciate that the service is selective and quality controlled;
help users understand the level of quality of information they will find when using the service;
help staff be consistent in their selection and to maintain the quality of the collection;
help train new staff; and
ensure consistency in collections that are developed by a distributed team.
According to this list, formal policies are equally valuable for staff and users. Similarly, quality and consistency receive equal emphasis; this underscores the need for a formal policy that staff members apply and users understand. Only when selection practices are broadly understood and consistently applied can quality be defined and identified. Through the consistent application of accepted practices, the integrity of the collection (i.e., the value of the individual pieces and of the collection as a whole) will be assured and consistent.

As these five points demonstrate, the fundamental principles and components found in policy statements pertaining to collections of print and analog formats apply to a certain extent to free Web resources. Understanding and acknowledging this fact will facilitate developing and articulating policies suited for free Web resources. This understanding will also assist staff in recognizing that Web resources are a continuum in the history of scholarly communication, and that their intellectual content is not inherently different from that of other formats. Nonetheless, while much of the rationale for developing collection policies in the print and analog environment applies to developing collections of free Web resources, evaluating the full essence and nature of free Web resources requires additional evaluative criteria-criteria that evaluate the unique aspects of information contained in and delivered through free sites.

3.3 Collection Scope Statement

A well-developed collection policy includes a scope statement as well as selection criteria. The scope statement is the first filter in the resource evaluation process. It defines the parameters of the collection in broad terms by describing what is included in and excluded from the collection. The justifications for these inclusions and exclusions should be clearly stated and based on the needs of the intended users. At a minimum, the scope statement outlines the subjects included in the collection. It clarifies the acceptable sources for information (e.g., academic, government, commercial, nonprofit, personal resources) and the acceptable level(s) of difficulty (i.e., suitable for use in higher education settings, scholarly, kindergarten through twelfth grade, or popular) that will be considered. The scope statement also describes the resource types presumed to be relevant to the subject and the primary audience. The following list of resources included in the Humbul Humanities Hub can serve as a guide to defining collection scope:

primary/original sources in electronic format (i.e., information mounted on the same server as the site and created or produced by the owners of the site)
secondary sources, whether published solely on the Web or as surrogates for printed editions (i.e., “third-party information,” located and created at another site and made available through a link that takes the user to the other site; secondary information may be primary information at its host site)
research projects and reports
bibliographies and bibliographic databases
electronic journals
e-mail lists where online archives exist
academic department Web pages
professional association Web pages
resource directories
If resources that one might expect to find are excluded on the basis of access restrictions (e.g., technological barriers, cost, or registration) or of content, the scope statement should list these exclusions and give the rationale for not providing access to them.

Other aspects of scope concern language and geographic parameters. Although often overlooked because of the predominance of English-language resources on the Web, the increasing number of bilingual and multilingual high-quality sites calls for greater consideration and clarification of the language parameters imposed on a collection. The growing number of scientific sites, especially mathematics sites, where numbers, signs, symbols, and formulas are not language-specific demonstrates the importance of evaluating narrow or highly restrictive meta-language parameters before they are put in place. Similarly, geographic parameters need to be carefully reviewed and clearly stated. The Web is one of the primary forces reducing, if not erasing, geographic barriers to communication and access to information. The decision to limit the scope of a collection to resources developed in a particular country or in a particular language may be appropriate, but the justifications for that decision need to be stated.

Recommended Examples of Scope Statements

DutchESS. Dutch Electronic Subject Service. DutchESS Scope Policy. Available at http://www.kb.nl/dutchess/manual/scope_eng.html.

Humbul Humanities Hub. Available at http://www.humbul.ac.uk/about/colldev2.html.

Internet Detective. Creating a Scope Policy for Your Service. Available at http://www.netskills.ac.uk/TonicNG/content/detective/56.html.

Jennings, Simon. 2000. RDN Collections Development Framework. Version 1.1 (May). Available at http://www.rdn.ac.uk/publications/policy.html.

Library of Congress. BEOline: Selection Criteria for Resources to be Included on the BEOnline+ Project. Available at http://lcweb.loc.gov/rr/business/beonline/beonsel.html.

SOSIG Social Science Information Gateway. Training Materials and Support. Available at http://www.sosig.ac.uk/about_us/user_support.html.

3.4 Selection Criteria

Although resources that do not fall within the parameters of a collection scope statement should be rejected without further review, not all resources that do fall within the scope of a collection necessarily warrant inclusion. Further review and evaluation using established selection criteria assure that a collection will include only those materials that best meet users’ needs and expectations (i.e., those that meet the definition of “high quality”). Responsibility for setting these selection criteria varies from one gateway to another. Some gateways openly state that “no [selection] criteria were established” (e.g., ALA Machine-Assisted Reference Section [MARS] (1999)). Others provide brief selection criteria statements (e.g., Best of the Best Business Web Sites [BRASS]). An increasing number of specially funded projects are built on highly articulated selection policies (e.g., Dutch Electronic Subject Service [DutchESS]; Engineering Electronic Library [EEL], Sweden; and Internet Scout Report).

The selection criteria described in the selection policies and collection statements reviewed for this paper can be categorized into four groups: context, content, form/interface, and technical criteria. Each of the four groups comprises multiple criteria. No one criterion will adequately evaluate a site, and frequently several criteria overlap and intertwine-a process that blurs their definitions yet ultimately reveals more effectively the quality of the site under evaluation. A review of the collections using the following criteria did not reveal that any of them are more important than any other; moreover, no stated or implied priority order for applying these criteria can be found. Therefore, the order in which criteria and their categories are presented here should not be construed to imply any hierarchy or relative importance. Rather, they appear in an order that might facilitate the culling process. Their inclusion is based on their repeated occurrence among selection criteria used and described by various projects and programs committed to the development of high-quality collections of free Web resources.

3.4.1. Context Criteria

“Context” applies to the origin (provenance) of a site and its content, as well as to the suitability of a new resource to an existing collection. How a site meshes with and enhances existing or anticipated content within a collection will determine the quality of the collection over time.

3.4.1.1. Provenance. The origin or source of a site reveals and confirms much about its value and provides important information about its overall quality. Parsing a site’s URL often yields sufficient information to judge the affiliation of its creator and the reliability of its server.

3.4.1.2. Relationship to Other Resources. Like print resources, free Web resources can stand alone or in aggregate with other resources. As with any collection, however, the value of a collection of Web sites lies in the integrity of its individual components. The greater the degree to which each site (component) within a collection relates to and enhances others within the same collection, the greater its value of the collection as a whole. Evaluations of free Web sites have shown that many of them contain links to the same sites. This redundant content will increase the size, but not the quality, of the collection. Each site added to a collection must be viewed as an integral part of a larger mosaic. Redundant, superfluous, unrelated, or poorly suited pieces will not enhance the collection; they will only encumber it and ultimately discourage or confuse users.

3.4.2. Content Criteria

Content is arguably the most important criterion used when selecting resources, and it must apply equally to all sites. The term applies to the information contained in a resource, and it is used both qualitatively and quantitatively. A high-quality site with little content may or may not prove to be a useful site; however, the content of a low-quality site that has high quantity requires close scrutiny, for if little else exists on a specific subject, its mere availability may prove useful. Content should be evaluated on the basis of multiple factors and ultimately judged on the degree to which it supports the purpose of the collection to which it will be added.

3.4.2.1. Validity. Editors, reviewers, and publishers carefully vet the content of printed scholarly resources before they are published. This is not true in the case of free Web resources. Anyone with certain basic skills and access to server space can make information accessible to anyone who happens upon it. Web-based information that is flawed, whether intentionally or inadvertently, is not always easily discernible from information that has been carefully and thoroughly verified. Especially when combined with high-quality Web features such as attractive graphics and slick navigational functions, misinformation, false claims, and factual errors may not be self-evident. For example, an essay by I. Newton on the theory of gravity or by W. von Braun on rocketry may not contain the ideas of the great scientists these names imply, but rather the work of Ian Newton and Wesley von Braun, 12-year-olds hard at work on their middle school science projects.

The validity of free Web resources should be measured on the basis of several factors. Is the source of the content clearly identified? Does the URL support the claims of the “author’s” affiliation and credentials? Is contact information readily available, and, if so, what does it reveal about the person or team responsible for the content? Is the information available in print? If so, does the print version imply or state a critical review or validation of the information contained therein? To what extent has the Web resource been vetted by a third party; has it passed through any kind of “quality filter”? Is there evidence of or description of the extent and nature of quality checks, review, or evaluation of the content? Is the source of the information adequately described? Is the research process leading to the creation of the content described? Are references cited? Is a bibliography or webography provided?

3.4.2.2. Accuracy. Whereas validity measures the degree of objective truth, accuracy is a measure of the degree of correctness in the details. A resource may exhibit validity but lack a high decree of accuracy. Not all books that have successfully passed through the vetting process because of the validity of their content are as highly reviewed when evaluated for the accuracy of their supporting details. To some degree, the same is true of Web site content, which may exhibit inaccuracies caused by errors in data entry or by intentional biases of the person or team who developed the content. Even errors in spelling and keying are important indicators of the accuracy of the content. A high error rate in one aspect of a site (e.g., keyboarding) is often a good indicator of other potential flaws. Many of the measures for judging validity also apply to evaluating accuracy. Judging accuracy is difficult. It requires subject expertise and can prove to be highly subjective.

3.4.2.3. Authority. The authority of a site is dependent on the reputation, expertise, and credentials of the person(s) responsible for creating the site and providing access to it (in traditional terms, the author and publisher). The experience and credentials of individuals who create free Web sites vary radically. Authors may be enthusiastic hobbyists or recognized authorities who have devoted years of study to the topic. Does the site provide biographical information or resumes about the author? Knowing whether the site content is attributable to Sir Isaac Newton or to the hypothetical Ian Newton referred to earlier may provide an important measure of the site’s authority. Similarly, the location of the server may prove valuable in evaluating authority. Servers maintained by colleges, universities, museums, scholarly societies, or professional associations imply high levels of expertise in the development of site content. URLs containing a tilde (~), even if on a server maintained by such institutions or organizations, require closer evaluation, because a tilde normally signifies a personal page that is merely maintained at that location. Other indicators of authority are postal and e-mail addresses that can support authors’ claims regarding their credentials and affiliations. The primary measures of authority are

the creator of the site
the creator’s reputation
where the server is located
how many other sites link to it
Sites that do not identify authorship require particularly close evaluation through the application of other content criteria. Anonymity is normally not associated with high-quality content.

3.4.2.4. Uniqueness. Uniqueness is a measure of the amount of primary (i.e., original) information contained in a site. It also is a measure of the extent to which information at a site under review is duplicated in sites that have already been selected for the collection. Some degree of content redundancy is acceptable; one common characteristic of free Web resources is that their content duplicates that of other sites. The context and manner in which duplicate information is presented at other sites may argue in favor of selecting sites that contain information currently in a collection. Redundancy, however, must be evaluated carefully. The nature of digital information and its accessibility by multiple simultaneous users nullify all legitimate arguments for duplicate copies of information in print and analog collections. Users searching a specific collection will not be well served if their search yields multiple hits, all of which point to the same information and accessible through multiple sites. Sites that contain primary information not available through other sites, depending on how the site measures up against other criteria, will undoubtedly prove to be of greater value than will sites containing secondary information—unless the secondary information offers significant added value. “About this site” links can prove highly beneficial in evaluating uniqueness. An evaluation of the URLs to which the site links will reveal whether they point to information at the site (primary information) or to external sites that have probably been created by someone else. Links to external sites can be desirable, particularly if they offer significant added value.

3.4.2.5. Completeness. Completeness does not mean comprehensive coverage of information or exhaustive treatment of a topic. Rather, it refers to the availability of content at a particular site. The phrase “under construction” signals that the creators of the site are eager to inform others of their work but that the content or site design remains incomplete. However, the absence of the phrase “under construction,” similar wording, or icons conveying that message does not assure that the content is indeed complete. A site is incomplete if it points to non-networked resources or to print resources for the full “edition” of the “work” or if links are grayed out. Some links may not be grayed out but point only to empty files. The content may fail to support the purpose of the site; this is a serious indication that the site is not complete. The extent to which the content of a site is incomplete and the length of time it remains incomplete affect the quality of the site and the quality of any site or collection that links to it.

3.4.2.6. Coverage. Coverage refers to the depth to which a subject is treated. The term is used both qualitatively and quantitatively. How adequately the site treats the topic and to what degree (i.e., depth) it treats it determine the quality of coverage. Is the topic sufficiently represented, or are the various aspects of the topic treated superficially or not at all? Is there primary information? If little or none, how successfully does the site provide linkage to primary content maintained at other sites? What are the gaps in coverage? Even the finest print collections have lacunae, but there is a limit to the extent and nature of gaps, in both print and digital collections, beyond which the limited coverage affects the quality of a specific item or the collection generally. Links that are present, but broken, adversely affect coverage. Indexes, contents pages, and bibliographies facilitate the evaluation of the coverage; however, sites that are limited only to links or that consist only of enumerative bibliographies may prove inadequate if the bibliographic information is not supplemented with abstracts or links to the full texts. On the other hand, if the site provides the only bibliography on the subject or provides more substantive or current content than do other bibliographies, it may be an important site solely on the basis of its bibliographic coverage. At this time in the history of the Internet, users should not expect to find that coverage of a topic provided by a free Web site necessarily equals that of many print resources. Internet publishing, though maturing rapidly, is in its incunabulum phase.

3.4.2.7. Currency. Currency pertains to the degree to which the site is up-to-date; i.e., the extent to which it presents prevailing opinions, ideas, concepts, scientific findings, theories, and practices relating to the subject. Policy statements, for example, on the selection of “electronic resources” written in the early 1990s and concerned primarily with the evaluation and acquisition of information on CD-ROMs would appropriately receive a low currency score today because the digital formats currently available were unforeseen at the time the site was created and criteria for their evaluation had not yet been articulated. Currency, therefore, refers to content that describes the current thinking and the context in which higher education finds itself currently. Just as a library collection of print materials must reflect the latest research, so too should high-quality free Web site content. (See also a discussion of Information Integrity in section 3.4.4.1.)

3.4.2.8. Audience. Audience is what traditional print selection criteria describe as “intellectual level of content.” That is, for whom is the information intended? Once the intended audience is discerned, the person selecting the site should ascertain how well the site meets the needs and interests of that group. A high-quality site on organ transplants that is created for middle-school students will not serve the needs of university students. The site itself could be outstanding, but if it fails to meet the needs of its intended users, it is of low quality.

3.4.3. Form/Use Feature (Accessibility) Criteria

Form criteria are features that determine how the content is presented and how accessible it is. They include, for example, organization and user support. How readily a user can access content at a site, display it on a monitor, and download or print content depends on site design, user aids, and software applications. Unlike print resources, Web sites are not uniform in structure. Their access and navigation are neither uniform nor based on centuries of tradition-bound organizational concepts. Web resources lack the physical structure and content sequence (e.g., title page, foreword, introduction, table of contents, contents, and indexes) that characterize how printed matter is organized and that facilitate access.

Well-developed and organized digital content is far more than a series of pages to be read sequentially or accessed by page number or by specific indexing terms. Rather, it is multifunctional, allowing dynamic searching and downloading of content. Display, however, is a two-dimensional image that lacks the physical cues that permit or enhance use. Links and icons facilitate access to the content and permit the user to comprehend the purpose and content of a resource and use it effectively. Because these features may differ in quality, form (use features) must be as carefully evaluated as is content when selecting Web resources.

3.4.3.1. Composition and Site Organization. Because of the peer-review and editorial processes associated with print publications, an analysis of how information in a print format is organized, structured, and arranged requires only minimal evaluation during the selection process. Editors and reviewers contribute greatly to the final form of a printed work; spelling errors, bad grammar, poor style, and poor organization of content are removed in the vetting, revision, and editorial processes. Web resource creation does not consistently include equally rigorous evaluation and revision as does information in print format; in fact, the most extensive evaluation of composition and organization may occur after completion and then only by end-users. How well a site is composed and organized will determine how accessible users find the content of any Web resource. Thus, the selection process must include an evaluation of whether Web site content is logically or consistently organized or even divided into logical, manageable components that meet the needs of the intended users. Further, evaluators should take into consideration whether design enhances or hinders accessibility. A site may prove be “overly designed” if aesthetic features (e.g., wallpaper, graphics) require an unnecessary use of plug-ins or hinder access because they make images slower to load. Not all users have state-of-the-art computers or high-speed modem connections. Ease of access is relative and should weigh heavily in site evaluation (Wells, Calcari, and Koplow 1999, 210). For a good discussion of issues pertaining to composition and site organization, see “Training Materials and Support,” on the Social Science Information Gateway (SOSIG) Web site.

3.4.3.2. Navigational Features. In addition to software applications (such as special viewers), layout, design, search functions, and user aids can facilitate or impede navigation of a site. Each of these features requires a separate evaluation, but it is their successful combination that produces a high-quality site. One measure of navigational quality is browsability. How easily can the user browse the content? Are logically organized subsets of related information grouped so that manageable amounts of data or information can be browsed? Browsability is directly related to the value of a search function within a resource. Does one exist, and is it adequately designed and described? The availability of keyword search functions combined with Boolean operators should be expected.

The value of search results, however, depends on the quality of indexing. Whereas search functions pertain to how one searches; indexing pertains to choice of terms and to the depth to which terms are linked to content. Indexing and searching should, therefore, be evaluated separately. Superior search functions provide little added value if poor quality indexing standards or practices have been applied. Once a search has located information within the resource, how conveniently can the user print or download relevant content? For example, can a user print or download a single file of documents that is composed of a series of separate pages? Some site criteria suggest that substantive content should be no more than three clicks away, and that all links should be unambiguous and make it clear to the user where links are leading (see SOSIG). Some navigational features (e.g., tables of contents and indexes linked to content and buttons or icons directing users “home,” “forward,” and “back”) are quite simple.

3.4.3.3. Recognized Standards and Appropriate Technologies. Closely related to composition and navigational features are the standards and technologies used in developing a site, for they determine the degree to which a user may access and use Web-based content. Today, accessibility weighs heavily in decisions pertaining to both technology and design, and closer attention is paid to developing and adhering to minimum design standards that will assure access to the largest number of users possible. The U.S. Department of Education Office of Civil Rights has defined these standards as those that ensure “timeliness of delivery, accuracy of the translation, and provision in a manner and medium appropriate to the significance of the message and the abilities of the individual with the disability” (Waddell 1998). Currently, the implementation of the Americans with Disabilities Act is receiving close review in higher education circles, for accessibility to the Web obtains whether one wishes to assure access by individuals with disabilities, distance learners, or users with slow modems and low bandwidths. The World Wide Web Consortium (W3C) has issued Web Content Accessibility Guidelines 1.0 (Chisholm, Vanderheiden, and Jacobs 1999).

Before selecting free Web sites, one must consider the range of these standards and technologies and determine which are most critical for prospective users. Two fundamental questions are whether sites function in generally available environments or whether special extensions are required. All users expect content to load quickly and access to be rapid and available at any time. The Internet Detective provides important introductory guidelines for evaluating accessibility and for determining how it is affected by the standards and technologies that designers employ. For example, does the resource use proprietary extensions to HTML that some browsers cannot recognize? Can the information be accessed if the user has keyboard-only navigation capabilities? The presence of metadata, alternative text when images are switched off, text captioning for audio material, and links and downloading instructions to any software that is required to use the resource are key to evaluating the extent to which currently accepted or recommended standards and technology have been used in developing a site. Simply stated, are the technologies used appropriate for the content and the intended audience? One basic question that should not be overlooked is whether a site offers users more than they might find in a print resource containing the same information. For further details on this subject, see the Internet Detective.

3.4.3.4. User Support. User support relates to content (e.g., scope statements and collection policies) and technological applications. A well-developed Web resource provides information that answers users’ questions as they arise. Frequently, this information is provided through frequently asked questions (FAQs), which anticipate the most commonly requested or needed information about using the site.

FAQs, however, provide only static support. In some cases, this is sufficient. For example, if a particular viewer or audio software is necessary or preferable to provide full access to content, a simple statement to that effect may prove adequate for the user to proceed. When, however, interactive support would be more beneficial to the user, help screens or contact information should be easy to locate and user-friendly. “Contact Us,” “Help Desk,” or similar links (e.g., e-mail, postal address, phone numbers, office hours, individuals’ names) should be included in the design and be easily identifiable.

3.4.3.5. Terms and Conditions. Access to some free Web sites requires that users first agree to specific conditions. These may range from simple registration to user fees or agreements to download information only under specific conditions. A great deal of content delivered through the Web that supports higher education and research is accessible without any restrictions. Where restrictions do exist, one must evaluate whether they are appropriate. If registration is required, why is it necessary? How will user information be retained and used or shared with third parties? Registration information may be gathered to assist site developers in monitoring what and how users access content in order to improve content coverage and access over time. It may also be gathered for reasons that are not apparent or that go unstated. Where user fees are applicable, are they appropriate? If fees are required to access a site, does the service point only to resources that are themselves free? If so, does added value at the site warrant the required cost to users? Terms and conditions of use per se may not be inappropriate, but their validity should be confirmed before a site is added to a gateway or OPAC.

3.4.3.6. Rights Legitimacy. Related to terms and conditions of use is the question of whether those responsible for the content and access to the site have the right to place restrictions on the content at that site. It is relatively easy to claim rights to information, but such claims may be unfounded and, therefore, not legitimate. Proving the legitimacy of such claims is difficult and time-consuming.

3.4.4. Process or Technical Criteria

Among the most frequently cited characteristics of Web resources is volatility. Links within a site may abruptly cease to function; an entire site may disappear without warning. Free Web resources come with no guarantees that their content or accessibility will remain constant or improve over time. In extreme cases, an entire resource may cease to exist or change its URL without directing users how to access the same resource at a new URL. Process criteria are those technical features that measure the integrity of a site and the availability of content reported to be provided by the site. They are the functions and features that permit the end user to access the content selected and described by the provider (i.e., author or creator).

3.4.4.1. Information Integrity. Information integrity, or maintenance, pertains to the intellectual content offered by a site and is measured by how successfully the value of the content remains current or improves over time. Determining the value of content at any site requires knowing when a site was created, at what frequency updates and revisions are scheduled, and the date of the last revision. Without one or more of these “time stamps,” one cannot know whether content is current. Criteria for evaluating these dates and intervals between revisions vary greatly, depending on the nature of the content. Time-sensitive data require frequent verification and revision; other types of information are quite durable. Information that is not time-sensitive may be well served by static resources that are updated only infrequently or not at all. An important indicator of information integrity is knowing whether the provider or creator of the site is likely to maintain the resource over time. For example, resources developed by students, especially those developed as part of a short-term project, or by staff supported by one-time project funding, may not be maintained over time. The shelf life of such resources is frequently short, and the reliability of their content decreases at an accelerating rate over time. (See also Content Currency, Section 3.4.2.7.)

Site content that is revised periodically presents its own range of challenges. Whereas superseded information in print format remains accessible through the retention of earlier printings and editions, superseded Web content frequently ceases to exist. It is now, however, an increasingly accepted practice to archive superseded content when that content has potential durability or value over time.

3.4.4.2. Site Integrity. Site integrity pertains to the stability a site exhibits over time and to how a site is administered and maintained. It is, therefore, the responsibility of the site manager or Web master. Even static resources require oversight, lest access to the resource itself ceases. Site integrity requires a commitment to ongoing maintenance. Broken links must be fixed promptly. Updates and revisions should be noted and reflected in revision numbers. Whether superseded content is archived, the URL where that content is maintained, and how requests to access the archive should be made is critical information that should be included in FAQs or through other forms of information pertaining to the site. Without adequate information on these issues, the integrity of a site will be poor or nonexistent.

3.4.4.3. System Integrity. System integrity is a measure of technical performance. The term refers to the stability and accessibility of the server hosting the resource. Assuring the quality of system integrity is the responsibility of the system administrator, not the creator of the information or the site manager. Server stability is perhaps the single most important criterion in judging the quality of a resource. The quality of the server ultimately determines the value of a Web resource. A server that experiences more traffic than it can handle will not provide consistent ease of access and will reduce the value of the resource, regardless of the value of its content. Frequent downtime or poor response time renders a resource virtually inaccessible. Anticipated downtime should be announced in advance, and all downtime should be reported, along with a projected date and time when the resource will again be available.

Important evaluators when judging system integrity are whether a site is mirrored and whether one can access it multiple times within a specified period (e.g., a certain number of times per day or week). Internet Scout Report staff, for example, check the availability of each site three times in the days prior to making it available. Staff responsible for selecting free Web resources at the Kuopio University Library in Finland require that links work sufficiently well on the third and final attempts to access a site over set intervals (Kuopio University Library Working Group 1996).

Recommended Examples of Selection Criteria

Caywood, Carolyn. 1996. Selection Criteria for World Wide Web Resources. Public Libraries 60 (Summer):169.

DESIRE. Selection Criteria for Quality Controlled Information Gateways. Available at http://www.ukoln.ac.uk/metadata/desire/quality/toc.html.

HealthWeb. Selection Methodology and Guidelines. Available at http://healthweb.org/guidelines.cfm.

infor-quality-l. Selection Criteria for Internet Information Resources: A Poll of Members of infor-quality-l. Available at http://www.vuw.ac.nz/~agsmith/evaln/poll.htm.

Librarians’ Index to the Internet. Selection Criteria for Adding Resources to the LII. Available at http://www.lii.org/search/file/pubcriteria.

McGeachin, Robert. 1998. Selection Criteria for Web-Based Resources in a Science and Technology Library Collection. Issues in Science and Technology Librarianship 18 (Spring). Available at http://www.library.ucsb.edu/istl/98-spring/article2.html.

MedWeb. Guidelines for Inclusion of Sites in MedWeb. Available at http://www.medweb.emory.edu/MedWeb/history.htm.

Pratt, Gregory F. 1996. Guidelines for Internet Resources Selection. College & Research Libraries News 3 (March):134-135.

Scout Report. Selection Criteria. Available at http://scout.cs.wisc.edu.

WWW-vlib. Summary of Selection Criteria. Available at http://lists.w3.org/Archives/Public/www-vlib/msg00276.html.