3. Making Resources Accessible to
Other Communities
Someone recently compared the Web with a large room filled with
books that were scattered all over the floor. The Web is the world's
largest mass of bits and bytes. It is a meeting place that brings
together disparate communities. The "Internet Commons," as
this meeting place has been called, requires connections between
and among disparate communities in order for an "economy" to
develop (Weibel 1999). This economy will provide the framework within
which both commercial and noncommercial transactions can occur. KOSs
are one means of connecting these disparate communities. Knowledge
organization systems can be used to (1) provide alternate subject
access, (2) add modes of understanding to digital library resources,
(3) support multilingual access, and (4) supply terms for expansion
of free-text searches in domains that are relatively unknown to the
user.
Providing Alternate Subject Access
Alternate subject access refers to the provision of one or
more additional subject orientations that make the resources of the
digital library accessible to different audiences. This approach
is particularly valuable when the digital library resources appeal
to groups that do not share a common terminology. It can be a system
of subject headings, a classification scheme, or any other subject-oriented
system. Alternate subject access can be provided by
- indexing or classifying the resources using multiple schemes,
- retaining original schemes from organizations that contribute
to the digital library, or
- mapping between the primary scheme and an alternate scheme.
Indexing the Material with Multiple Schemes
The most direct method for providing alternate subject access to
a collection is by classifying or indexing the resources with multiple
schemes, but it may also be the most costly. This approach requires
redundant cataloging or catalogers who are knowledgeable in both
schemes. It may also require modifications to the cataloging tools
and procedures. However, if the cataloging is at a high level (resources
versus individual documents), or if the schemes are not difficult
or detailed, it may be a reasonable approach.
Retaining Alternate Indexing from Contributors
If the digital library is being built through contributions from
a variety of sources, the originating organization may have applied
an alternate scheme that could be used. For example, the NASA database
on aeronautics and astronautics receives relevant bibliographic records
from other U.S. agencies, such as the Department of Defense and the
Department of Energy. The controlled vocabulary terms assigned by
the contributing organization are processed through a machine-aided
indexing process to create candidate indexing terms from the NASA
Thesaurus for review by NASA's indexers. However, the final records
contain both the NASA Thesaurus terms and the controlled vocabulary
terms from the contributing organization, with the alternate indexing
terms retained in a separate data element in the bibliographic record.
The terms collected from other organizations can be viewed as an
alternate access point, so that at least part of the collection is
accessible through another discipline's terminology.
Mapping Multiple Schemes
The third method for providing alternate subject access is the most
indirect, that of mapping one or more schemes. Several examples of
this approach can be found among A&I services. Both BIOSIS, the
world's largest private sector A&I service in the life sciences,
and the NLM apply MeSH to BIOSIS documents. The records that BIOSIS
contributes to NLM's TOXLINE database are processed automatically
to have appropriate MeSH terms added. This is based on a mapping
of the natural language terms that occur in the toxicology literature
and BIOSIS' normalized natural language keyword indexing with the
MeSH terminology. In the new BIOSIS relational indexing structure,
BIOSIS builds and maintains authority files that connect natural
language disease names to the MeSH-controlled disease terms. When
the BIOSIS indexer assigns the free text keyword for the disease
name, the appropriate MeSH term is also added to the record as an
alternate access point (BIOSIS 1999). The assignment is based on
the development over time of a mapping between the terminology used
by BIOSIS and the MeSH-controlled terms.
In addition to providing alternate access points to BIOSIS products,
the inclusion of the MeSH terms makes it possible to perform cross
database searching on the indexing field with MEDLINE and other databases
that include MeSH terms. From 1999 forward, users can search BIOSIS
databases using MeSH disease terms. The disease terms can be extracted
from the MeSH authority file or from a MEDLINE record and then used
in a search against the BIOSIS files, or vice versa. This helps users
find relevant records that are unique to either BIOSIS or MEDLINE.
The inclusion of terms from an alternate KOS, such as MeSH, therefore
supports the use of BIOSIS by medical librarians and practitioners
who are familiar with MeSH terminology.
A more extensive example of mapping variant schemes is the metathesaurus
developed by the NLM's Unified Medical Language System (UMLS). This
system has linked more than 40 separate KOSs from various medical
specialties. They range from MeSH to coding and classification schemes
used by insurance companies and physicians to describe treatments
and diseases on patient records. The UMLS is licensed by many other
organizations for inclusion in applications that can bridge various
health care communities.
How can digital libraries use alternate indexing? While many digital
libraries do not have the A&I resources of large database producers
such as NLM and BIOSIS, the concept of applying alternate indexing
can be scaled to fit. While the systems described deal with item-level
bibliographic records, alternate indexing can be applied at several
levels. Alternate subject access can be applied only at the resource
level, for the database, electronic book, electronic journal, or
image collection, so that other communities can identify resources
of interest that must then by searched or browsed individually. This
concept is conducive to use with portals that provide access to the
same resources with different views for different audiences. Alternatively,
if the digital library has bibliographic records or metadata records
at a very detailed level, it may be possible to develop switching
programs that will translate concepts from the original organization
of the digital library or resource to that of the alternate scheme.
Adding New Modes of Understanding to the
Digital Library
People perceive the world through many modes, including textual
and graphical. Some people comprehend information more easily in
one mode than another. Most people benefit from a variety of modes
that reinforce one another or that can be used when appropriate to
the context. Many digital library projects remain text-based; however,
this text-only dimension is changing as digital libraries become
oriented more to multimedia and as other modes of information presentation
become viable on the Internet.
KOSs can be used to bring new dimensions to an information resource
or a collection in a digital library. In the digital library environment,
these dimensions can be viewed as layers that can be added on top
of one or more objects. Various tools and services can be developed
that are geared to a particular mode. For example, the results of
a text search can be presented in graphical or visual form, based
on the number of occurrences of a term or concept or on the occurrences
of documents from a particular country, journal title, or author.
A more complex dimension that can be added is the geospatial dimension,
which emphasizes access by place. A "geolibrary" is defined
as a digital library consisting of "geoinformation," or
material that can be accessed by place (National Research Council
1999). This so-called georeferencing can be either direct (by a geospatial
footprint, a series of latitudes and longitudes for the location)
or indirect (by a textual place name). Georeferencing of textual
objects is facilitated by a gazetteer, which brings together the
place name and the spatial footprint for its location.2 Many
gazetteers also include feature types for each footprint. The vocabulary
used for the feature types varies among gazetteers, but may include
terms such as "airport," "harbor," and "railroad
station."
Although many organizations, including federal and state agencies,
are currently required to provide geospatial referencing as part
of the National Spatial Data Infrastructure Program, the geospatial
referencing is not readily available for older works. How can the
data sets of today be integrated with the textual information of
yesterday? The answer is by adding geospatial referencing to the
text resource. Geospatial referencing requires that the text name
for a place have an associated spatial footprint. This can be achieved
by using a georeferenced, digital gazetteer that provides geospatial
footprints for place names.
Through this type of knowledge organization system, place names
in a library catalog or bibliographic database can have footprints
assigned (Blair 1999; Tahirkheli 1999). If one or more of the library's
resources have latitude or longitude coordinates in the catalog record
or in the full text but no place name, the coordinates can be extracted
and submitted to the gazetteer service. The service will return the
place name for the footprint. Alternatively, the resource may have
a textual place name. This place name can be extracted and searched
against the gazetteer, and the footprint can be provided to a mapping
application. The latter search may result in more than one footprint,
since place names may be ambiguous. Therefore, it is important that
the user interface be designed to allow the user to distinguish the
locations. Once the footprint has been determined, a user can access
the text resource through a geographic mapping tool. Alternatively,
a user of the text resource can find a set of results and have the
place names displayed as footprints on a map.
In disciplines such as ecology, environmental science, and even
public health and epidemiology, it would be beneficial to build a
digital library with access to such a digital gazetteer service.
Users could then access the system through the text mode or the geographic
mode, depending on their comfort level and the type of information
needed. Presenting the results on a map allows users to make new
associations and analyze the results more easily. Through a geospatial
KOS, they can see connections between disparate data, because the
data are presented in an alternate mode.
Providing Multilingual Access
A third way that KOSs can support the use of digital libraries by
disparate communities is to provide multilingual access. A variety
of sources, including multilingual dictionaries and multilingual
thesauri, can support this type of access.
One of the most extensive multilingual thesaurus efforts is the
Generalized Multilingual Environmental Thesaurus (GEMET) from the
European Environment Agency (EEA), produced by Italy's research council,
the Consiglio Nazionale delle Ricerche (CNR). The GEMET is available
in 12 languages, and plans for a global environmental thesaurus in
many more languages were recently announced. GEMET is available by
agreement with the EEA.
The European Topic Centre on Catalogue of Data Sources in Germany
is developing a system that will link data sources and metadata information
in a virtual library. GEMET will be used to convert a search in one
language into searches for the same concepts in other languages.
Users will retrieve documents not only in their native language but
also in other languages. This will allow data systems from throughout
the EEA and beyond to be accessed as a virtual library collection
with both controlled vocabulary and free-text term searching in multiple
languages.
Expanding Free-Text Search Terms
Free-text searching is the main method of searching on the Web.
Only a small percentage of Web resources have metadata, and an even
smaller percentage have controlled vocabulary assigned. However,
variations in natural language make free-text searching problematic.
Even a knowledgeable user may not know all the terminology (synonyms
or related terms) that can be used in the literature to express a
concept. The problem is exacerbated when the user is unfamiliar with
the topic or is interested in an interdisciplinary area. How can
the user expand his or her search to overcome these terminology differences?
One possibility is to use KOSs as aids to the selection of free-text
keywords.
The Getty Vocabulary Project emphasizes support for searching as
a significant application of its vocabularies. Harpring (1999) reports
that the vocabularies are increasingly being used in search engines
to look for different terms that refer to the same concept. The Getty
vocabularies (the Art and Architecture Thesaurus, the Union List
of Artists Names, and the Thesaurus of Geographic Names) are particularly
rich in equivalence relationships. "When these equivalence relationships
are exploited in search engines, there are typically two possible
scenarios: the user may be allowed to first query the vocabulary
database, locating appropriate terms, and then applying those chosen
terms in a query across target databases; or there may be little
or no user interaction with the vocabulary, when the vocabularies
are used behind the scenes [to expand the search] . . . " (Harpring
1999). Getty developed a prototype called a.k.a. to experiment
with the use of equivalence terms to broaden or narrow searches across
databases on the Web.
In addition to expanding routine search queries, KOSs can be used
in Web mining tools. Northern Light has developed a Web mining tool
that reportedly returns a high degree of relevant hits. The KOS that
supports the Northern Light site was built by ingesting large existing
vocabularies and thesauri. The result was then organized under an
extensive classification scheme developed by Northern Light. The
terms can be used to extend a user's search or to distinguish between
multiple meanings of the terms supplied by the user. The results
of a search are organized into "folders" based on the classification
scheme. These high-level categories, represented by the folders,
help distinguish multiple meanings of the same term. For example,
an ambiguous word such as "pitcher" might result in two
folders being presented to the user. One folder would be titled "Sports" (as
in baseball pitcher), the second "Decorative Arts" (as
in water pitcher). The user who chooses only the Sports folder will
be presented with only those Web resources that use "pitcher" in
the baseball sense. The user who selects the folder called "Decorative
Arts" will be presented only with those resources that are related
to water pitchers.
KOSs can be very powerful in supporting free-text searching within
digital libraries and in integrating Web resources into existing
digital libraries. However, these systems must be used with caution.
KOSs have generally been developed for a specific discipline, task,
or function, or for the indexing of a specific collection or database.
Therefore, depending on the domain in which the KOS is being used
and the complexity of the system, it may or may not suggest relevant
free-text terms. Expanding a search with related terms, rather than
pure synonyms, may return hits that are only peripherally relevant
to the user.
Summary
One of the benefits of the Internet, the Web, and digital libraries
is the degree to which resources can be made available to broader
audiences. The technology facilitates the connection of disparate
knowledge communities at the network level. However, discovery of
the resources and true accessibility require that the content and
its organization be understood by these disparate communities. By
providing alternate subject access, adding modes of understanding,
supporting multilingual access, and supplying terms for expanding
free-text searching, KOSs can facilitate discovery and understanding
by disparate communities, and allow these communities to interact
in new ways.
Footnotes
2. A
recent National Science Foundation-sponsored workshop, "Digital
Gazetteer Information Exchange," addressed the issues of digital
gazetteers. One of the critical issues is that there is no standard
for the interchange of information, either to provide gazetteer information
physically to another gazetteer or to interoperate with one or more
distributed gazetteers through the Internet. The workshop participants
emphasized the need for such protocols and for enhancements to current
gazetteers. (Many gazetteers do not include coordinates or are incomplete
in this regard.) The goal is to develop a digital gazetteer service
that can be accessed by any application.
Such a service is central to the vision of a geolibrary. A report
on distributed geolibraries from the National Research Council (1999)
envisions the geolibrary as a physical globe. One would walk into
such a geolibrary and be confronted not by a card catalog or an OPAC
terminal but by a large physical globe. The user would indicate his
or her area of interest by pointing to a place on the globe. The
librarian would use the geospatial location information to retrieve
and present materials related to that place. By comparing feature
types, the user could ask for other place names and locations that
were similar to the original.
Significant work into digital gazetteer services and geospatial
libraries has been conducted by the Alexandria Digital Library (ADL)
Project at the University of California at Santa Barbara, with support
from the National Science Foundation's Digital Library Initiative-1
(Hill and Zheng 1999). An ADL Gazetteer was created by merging place
name authority files from the National Image Mapping Agency and the
U.S. Board on Geographic Names of the U.S. Geological Survey. The
project also added controlled feature types to the gazetteer. With
the aid of a visualization tool, the information can be provided
on a map and accessed using other geographic visualization tools.
Next Previous
Return to CLIR Home Page >> |