5. The Future of Knowledge Organization
Systems
on the Web
As online databases moved to the Web, they began to provide their
products, including vocabulary aids, in this environment. Portable
document format (PDF) versions of printed vocabulary aids are common,
since PDF can be easily produced from a Postscript file and it retains
the look of the printed product. With Adobe's tools for indexing
and searching, the PDF file can provide some level of support for
linking. Many of these aids, however, remain in the form of HTML
files onlythere is no database structure to easily support the linking
and searching. In some cases, the full structure of the KOS is not
made available on the Web; the only format for a Web-based thesaurus
may be an alphabetical list of terms that does not enable the user
to navigate easily the hierarchical structure. As unique ways of
using these resources are developed, it is hoped that more KOS providers
will be encouraged to provide their systems in formats that are conducive
to such networked uses.
Some of the requirements for such electronic KOSs were identified
at a workshop entitled "Electronic Thesauri: Planning for a
Standard" and sponsored by NISO (1999). While the focus of this
meeting was digital thesauri, consideration was also given to other
KOSs in digital form. The identified requirements include persistent
identification at the concept level, the need for a simple protocol
for the distributed querying and response from a KOS, and the development
of a standard set of metadata attributes for describing a remote
KOS.
To facilitate the search and display of information from a previously
unknown KOS, the system must have unique and persistent identifiers
for each of the concepts in the system. For example, the California
Environmental Resources Evaluation System (of the California Natural
Resources Agency) and the U.S. Geological Survey have developed a
system for remote querying and response (CERES 1999). It requires
that each concept in the thesaurus have a unique identifier. In the
case of the previously described ITIS, which is accessed remotely
by the CERES system, the ITIS record number is used as the identifier.
Other unique identifiers could include the DOI, or a classification
notation that has been made unique by appending the scheme name or
the URL to the notation.
The second requirement is a protocol for the distributed querying
and response of KOSs. This is particularly critical for highly structured
systems such as thesauri, semantic networks, and ontologies. Work
has been done in this area within the Z39.50 community. (Z39.50 is
the NISO standard for searching distributed bibliographic databases.)
A profile has been proposed by the Zthes Working Group to tailor
the Z39.50 protocol to operate on thesauri that follow the Z39.19
standard.
A similar effort is under way at the CERES Project. Instead of a
Z39.50-based protocol, CERES has developed a structure that is based
on the Resource Description Framework (RDF) and the HTTP protocol
of standard browsers. The RDF's concept of containers is a natural
for managing the hierarchical structure of complex systems such as
thesauri. The structure proposed by CERES is likely to be encoded
using XML, a mark-up format that lends itself to structured information.
This protocol for linking distributed vocabularies will support both
searching and cataloging. The user will be presented with remote
vocabularies that can be displayed and navigated by a local client.
The third major finding from the NISO workshop was the need for
a metadata content standard for the description of KOSs. Such a standard
is key to provision of knowledge organization services over the Internet.
The metadata identify the Web resource as a KOS and provide important
information to allow an application to use it remotely without prior
knowledge of its content or structure.
A draft set of attributes for describing KOSs available in a networked
environment has been developed by a task group of the Network Knowledge
Organization Systems (NKOS) Working Group, an ad hoc group of terminology
experts from organizations that are interested in issues related
to the use and interoperability of KOSs over the Internet. The draft
attributes are based on work originally done by Linda Hill (Alexandria
Digital Library at the University of California at Santa Barbara)
and Michael Raugh (Interconnect Technologies).
The attributes describe the KOS so that content from the system
can be transferred over the Internet and handled by a remote browser
or client application. The attributes include the depth of hierarchy,
the types of relationships included, the subject (described by free
text or by a declared classification scheme), storage format, copyright
and rights management, and contact information. To facilitate the
transfer of information, the attribute set also includes information
on character set and file size. To facilitate the acquisition and
licensing of the KOSs, the draft content description includes point
of contact information.
During discussions about the metadata content standard, workshop
attendees identified three methods for storing the metadata for a
KOS. First, the metadata could be stored with the KOS, as metadata
elements for that resource. Second, the metadata could be stored
in a physically separate knowledge organization registry. The third
possibility is a hybrid approach, where a minimal set of metadata
elements is contained in a central registry (i.e., sufficient information
to identify the resource, where it is located, and how more information
can be obtained). The more detailed information would be stored with
the KOS itself.
There is significant interest in the use of KOSs to organize and
search material on the Internet. It is hoped that this interest will
result in knowledge organization services that will make these sources
more readily accessible to a variety of software applications and
to a variety of users. As services and enabled software proliferate,
it will be easier to integrate these KOSs into digital libraries.
Next Previous
Return to CLIR Home Page >> |