 |
4. Planning and Implementing Knowledge
Organization Systems in Digital Libraries
This section provides general guidelines that may be useful for
an organization that wants to use knowledge organization systems
to organize a digital library. The framework described is applicable
for KOSs of any type or subject.
Planning Knowledge Organization Systems
Analyzing User Needs
Of primary importance to any digital library project is an analysis
of its users' needs, in terms of content and functionality. Many
volumes have already been written about needs assessment, and providing
detailed guidance on this subject is beyond the scope of this paper.
However, when analyzing how a KOS might be used with a particular
digital library, it is essential to thoroughly understand the environment
of the user. One must look not only at the needs for organizing the
digital library materials but also at possible links between content
within and outside the digital library walls. This is particularly
important for KOSs that are acting as intermediate authority files,
because in such cases the links may not be readily apparent. It is
important to consider other views that might be valuable for users
and peripheral communities that might benefit from the digital library's
content were it accessible to them through a KOS.
Locating Knowledge Organization Systems
Once the user's needs have been analyzed, it is necessary to locate
KOSs to meet the need. While an alternate system can be built locally,
it is preferable to find an existing KOS for several reasons. First,
it is costly and time-consuming to build a KOS. Second, KOSs often
benefit by having been built over time. Many of the systems described
in this report have been built over decades; some existed in paper
before digitization. The value of a KOS comes from its acceptance
by the user community; sources built by noted authorities such as
learned societies, trade associations, or standards groups will be
viewed as more trustworthy than those built internally. Finally,
the networked environment has resulted in both an explosion of primary
materials, including documents, electronic journals, and Web-based
databases, and in an equivalent explosion of KOSs on the Web.
There are several ways to identify KOSs that may be of interest.
Many users are already aware of KOSs on the Web within their discipline.
Developers may also turn to directories, librarians in the field,
and reference sources, or they may perform a general search of the
Internet.
Planning the Infrastructure
It is necessary to make decisions about the architecture of the
KOS in the context of the digital library setting. The physical location
of the KOS is important. Will the system be held externally or internally?
There are pros and cons to either approach.
If the system is available on the Web, it is possible to consider
linking to the KOS as an external system. This architecture requires
a script or some search query to locate the resource. One must then
launch a query against the resource to obtain the piece of information
that will serve as the key between the two files. This key could
be a universal resource locator (URL) or input to another search
query. A query may be necessary if the KOS is stored in a database.
The script may transfer log-on information (including user ID and
password) from the digital library system to the external KOS, in
order to provide access to the Web-enabled database. In the case
of a more direct link, the access may be by URL.
However, the use of a URL as the link has the same problem with
persistence as does direct access via a URL from a browser. The organization
may move the KOS, thereby changing the URL that is being used as
the key. It is important to determine how often the URLs in the KOS
change, whether there is a means of notification of these changes,
and whether it is possible to consider an alternative that would
be more persistent. Schemes such as the Digital Object Identifier
and the Persistent URL have been devised to enable resources to be
physically moved among servers without having their names changed.
Another alternative is the use of other Uniform Resource Identification
(URI) schemes and the Uniform Resource Name (URN), which can be sent
from the newer Web browsers. The benefit of linking to a remote resource
is that the resource will always be up-to-date. The maintenance of
the KOS is in the hands of the owner, not the digital librarian.
It may also be more apparent to users that the KOS is not owned by
the digital library.
Linking to a remote KOS also has disadvantages. Persistence and
unexpected changes in the organization and content of the system
may cause problems. The software or telecommunications route between
the digital library server and the KOS may be unreliable. In systems
requiring fast response time or large amounts of data transfer, and,
therefore, high bandwidth (such as full-motion video or detailed
graphics), the fact that a connection must be made between the digital
library and the external KOS may make the system unacceptable to
the user.
Alternatively, the KOS may be obtained from the owner and loaded
locally. In many cases, this requires licensing that may not be required
when the KOS is accessed remotely, because a copy of the whole resource
is being provided to the digital library. Loading a KOS locally also
requires that one consider issues such as maintenance, local system
administration, and disk storage. If the KOS uses special software,
such as a database management system, loading the KOS locally will
require a copy of that software, which may require additional purchase
or licensing. Other considerations are the need for firewalls and
interface design. On the positive side, the KOS is under more local
control. Therefore, it may be possible to improve the response time
by not accessing the KOS over the Internet. If the KOS is to be used
behind the scenes (that is, the system is not visible to the user),
concerns of speed and integration become more important. If additional
modifications (including digitization) need to be made to the KOS
to integrate it with the digital library, it will also be necessary
to load the KOS locally.
If the digital library intends to incorporate numerous secondary
KOSs, it is important to consider the degree to which the architecture
is scaleable. The National Library of Medicine's UMLS incorporates
more than 40 different sources. While its main purpose has been to
develop a metathesaurus for moving among these vocabularies, the
management of the systems, regardless of the mapping issues, has
been a major consideration. Ingest has been a major concern, with
the need to develop a system that can handle a variety of input formatsfrom
ASCII text files to highly structured database output. The architecture
must also accommodate the character sets of the incoming sources.
This is particularly important if a mark-up language has been used
to represent special characters and diacritical marks. Systems that
have been developed in Unicode, which extends ASCII to accommodate
diacritical marks and non-Roman character sets, cannot be handled
by systems that deal only with ASCII or extended ASCII sets.
Since many digital library systems are being built as extensions
or applications of existing integrated library systems (ILS), it
is important to consider how the KOSs will integrate with the library
system. Unfortunately, many ILS vendors have not considered links
to external files or databases in their system designs. In some cases,
the vendor may require that the information be stored in the proprietary
format of the ILS. The system may require that the files be on the
same directory or server as the accessing ILS. The fields that can
be linked to the Web or searched may be limited. Outside communications
may require Z39.50 client-server connections. With relatively closed
systems, ILSs may be a difficult environment in which to implement
alternative and nontraditional KOSs.
Digital libraries that are interested in using KOSs should consider
this integration when developing requirements for the procurement
of a system to support them. Vendors should be encouraged to support
relatively open architectures and to consider the extension of traditional
library systems to support broader digital library functionality.
In addition to these immediate concerns, it is important to consider
the incorporation of future KOSs. Initial success may spur the desire
for integration of additional KOSs or enhanced functionality for
the existing KOS. Success may breed additional requirements and increase
the strain on hardware, software, and network architectures.
Maintaining the Knowledge Organization System
For a digital library, an outdated KOS can be more of a hindrance
than a benefit. Maintenance, both of content and of the system, should
be considered when planning a KOS. This is particularly important
if the digital library is to be self-supporting or revenue generating.
Version control of the KOS is extremely important. Reloading a new
version from the system provider is one way to accommodate changes;
however, this may not be acceptable if the locally held version differs
substantially from that held by the system's provider. If there has
been significant transformation or processing of the original KOS,
it may be difficult, or impossible, to reload the original and recreate
the changes that have been made.
A transaction-based approach, whereby only changes are transferred
between the KOS provider and the library, is also possible; however,
this requires that the system provider have the infrastructure, both
machine and human, to produce these transactions. It also requires
that the changes to the original KOS be identifiable in order to
create change transactions. For example, Stuart Nelson of the NLM's
UMLS Project recently reported that many systems can create annual
transaction records to inform the UMLS about the changes that have
occurred to the original system. However, the changes are often not
indicated with enough detail to support automatic change transactions
in the UMLS. If a change date, for example, is recorded only at the
level of the concept record, it is impossible to tell whether the
term has changed (a correction of a typographic error for example)
or if the relationship between this concept and another concept has
changed. Since the UMLS splits the incoming terminology and its relationships
into a variety of files, it is often difficult to tell how the UMLS
files must be change based on the changes made during the maintenance
of the original KOS (NISO 1999).
Presenting the Knowledge Organization System to the User
In addition to deciding which KOS should be used and what functions
it should serve, the digital library will need to determine how to
present the KOS to its users. A KOS may be exposed to the user or
made relatively transparent.
The KOS can be exposed to the user in different ways. Material can
be grouped into KOS-related themes or categories on the digital library's
Web site. The KOS may be used at a higher level to identify specific
portals for different uses or users. If the content of the digital
library includes metadata records, the KOS may be displayed as index
terms on the records or in its entirety as a navigation aid to searching.
In other cases, the KOS may be transparent. For example, a thesaurus
can be used behind the scenes to extend the user's search to include
synonyms, to connect the digital library's resources to other information
and resources, or to filter or rank the information obtained.
Implementing Knowledge Organization Systems
Acquisition and Intellectual Property Issues
It is critical to properly handle the acquisition of knowledge organization
systems. The first question is whether the KOS is under copyright.
If so, the copyright holder should be contacted concerning the KOS.
It is important to ensure that the apparent contact is the official
one. Many references have been reprinted or put on the Web without
proper acknowledgment of the real owner.
Once the contact has been made, there are several points for discussion:
- If the provider maintains the KOS, how will the digital library
find out about any changes that may be made in it? Is there a notification
mechanism in place? How frequently must theinformation be updated
to be of benefit to the digital library's users? Will the maintenance
be self-evident, or must the agreement include notification requirements?
What will the owner do if the maintenance can no longer be performed?
- What will happen if the provider discontinues the product or
sells or transfers it to someone else?
- What uses can the digital library make of the KOS under the proposed
agreement? As with other licensing, it is advisable to aim for
the broadest permissions and the longest term possible. At a minimum,
the library should be able to renegotiate the terms of the agreement
relatively easily.
- In a networked environment, it is beneficial to develop mechanisms
for linking to online versions rather than to maintain a local
copy of the resource. This ensures that what is presented is up-to-date,
and acknowledges more clearly the ownership of the KOS. However,
there are numerous factors to consider. Will the KOS be used on
an intranet or behind a firewall, where access to the outside or
information coming into the organization might be prohibited? Does
the KOS service use "cookies" or require knowledge of
the user's Internet provider address? Does it require a user ID
and password?
- If the KOS is to be accessed remotely, are there service issues?
Is it likely to be accessed with bandwidth, model, and computer
speeds that are adequate for outside connections of this type?
Is the use of such a critical nature that unreliable service on
the part of the KOS or the Internet connection will cause the digital
library itself to be viewed as less useful? Does the KOS require
a specialized search engine or search query formulation? Can the
digital library system properly display the results, or would the
results be better displayed through the KOS system? Will the resulting
information be used in its native form or must it be extracted
or transformed? If the KOS is to be loaded locally, in what formats
can the content be received?
- If the KOS is not available electronically, can it be digitized?
Is the owner interested in a cooperative venture, and are the human
eand financial resources for such an effort available?
Making the Link
There are two parts to establishing the link between the digital
library and the KOS. The first is locating the key anchor information
in the digital library's resource. The second involves the look up
against the target file. The creation of this link may be more or
less automatic, depending on the particular situation. The characterization
of this activity is meant to be general and to allow both "on-the-fly" links
and embedded links.
Regardless of what function the KOS is going to serve in the digital
library, the essential information contained in the digital library
resource from which the link is to be made must be identified. The
mechanism for doing this depends on the type of object from which
the link is being made and on the information that is expected to
be identified in the digital library's resource.
The first step is to review any metadata related to the digital
library resource. Do the metadata carry the term (such as SIC code,
artist's name, place name, geographic coordinates) that is needed
to make the link? If this information is included, the level at which
the metadata are assigned should be reviewed. If the metadata indicate
the subject matter of the specific resource in which the user will
be interested, the metadata can be used to make the links. However,
in some cases, the terms that appear in the title or description
at the resource level (e.g., the book) may not be indicative of the
subject at the individual item level (e.g., the chapter). Automatically
making a link on the basis of the content description for an entire
book may misrepresent the content of a chapter. Whether or not the
metadata can be used will depend on the amount and type of information
given in the metadata and the level at which the metadata are assigned.
If a text resource in the digital library provides no appropriate
metadata, the procedure for identifying the key information may involve
text analysis. A program to perform simple string searching or a
search engine that can preserve hit locations can be used if the
text string has distinguishing characteristics, such as a database
acronym, or a specific structure, such as a latitude and longitude
coordinate. If the text string has no such cues, text mining or more
complex text-analysis tools may be necessary. These tools use a variety
of semantic and syntactic algorithms to locate key information. There
have been significant advances in commercially available text-mining
tools, such as IBM's Intelligent Agent, which includes specific algorithms
for identification of names of places and persons.
The second step of the linking activity is to make the connection
to the KOS. The methods for doing this vary, depending on whether
the system is being loaded locally or is referenced remotely. If
the system is loaded locally, it is possible to perform a significant
amount of processing to match the two files, assuming that computer
resources of this type are available to the digital library organization.
If the system is only available remotely over the Web, the interaction
will require knowledge of scripting and various Web-based access
techniques. Scripting should be considered in both local and remote
approaches, since the more integrated the linking is with the resource,
the more maintenance may be required if there are changes in either
the resource or the KOS. Regardless of the approach that is taken,
making the link requires an analysis of both the information in the
original digital library material and the corresponding information
in the KOS.
If the KOS is being used as an intermediate file to bridge between
the digital library's resource and another resource, it is also important
to understand the data and the process whereby the search is performed
and information returned from the target resource. If the KOS must
return a value to the original digital library resource, the data
and process must be evaluated in a bidirectional sense.
Choosing the linking mechanism is equally important. The link may
be fixed or "on-the-fly." In the case of a fixed link,
a specific URL is embedded at the link point in the digital library
material. However, as stated before, problems of persistence are
inherent in this approach. Alternatively, a URN can be used. The
URN requires the creation of a namespace on the point of the target
file, and the search is to this namespace rather than to a specific
URL. Persistent locators (PURLs) and digital object identifiers (DOIs)
can also solve this problem. These schemes are sufficient if the
material is an HTML document.
Content in databases is more difficult to retrieve. The National
Library of Medicine now supports the searching of a variety of its
databases through its Internet Grateful Med (IGM) URL function. IGM
users can create URLs that will actually perform searches against
the databases. For example, the following script would perform a
search for "pneumonia" in the HealthSTAR file: http://igm-02.nlm.nih.gov/cgi-bin/IGM_robot.pl?datafile=HealthSTAR&search=Subject=pneumonia.
Information on the syntax for creating such a URL is provided on
the NLM Web site. While the intent is that the search URL will be
bookmarked by an individual user, the same concept can be used for
creating an active link at the anchor point for the link. With additional
scripting, the creation of the term pneumonia can be automatically
replaced with an active link that picks up the term where the link
has been made.
Summary
The framework for developing an infrastructure to support the use
of KOSs in digital libraries requires an analysis of user needs,
the identification and location of the appropriate KOSs, and the
development of the hardware, software, and network architecture to
support its integration and maintenance. The digital librarian must
make decisions concerning the degree to which they will be presented
to the user, acquisition and intellectual property issues, and maintenance
and update procedures. There are several technical ways to make the
link between the digital library and the KOS. As knowledge organization
systems are increasingly available on the Web, requirements are beginning
to be defined to improve the interoperability and general use of
these resources through the development of knowledge organization
services on the Web.
Next Previous
Return to CLIR Home Page >> |