Lee L. Zia1
(Lee L. Zia is Lead Program Director of the National Science Digital Library Program at the National Science Foundation).
Approximately a decade and a half has passed since the phrase “World Wide Web” and its enabling technology burst onto the scene. During that time stunning increases in communication and computational capabilities, coupled with equally dramatic decreases in cost, have produced networked information technology devices that have changed and continue to change fundamentally the relationship between people and knowledge.
This same period has also witnessed the evolution of the Internet from a pure research and development environment to one that pundits assert reflects the “commoditization of the Internet.” Closely associated with this growth in the commercial Internet has been the emergence of participatory capabilities for individuals that find their most recent expression in the rise of social networking trends, services, and community formation. This democratization of access to data and information has altered not just the “where” and “when” of learning, but increasingly the “how” and “by whom” that authority or certification of expertise is obtained or granted.
These changes challenge many concepts and traditions: the idea of the original, authoritative source, the fate of books, the role of libraries, the place of formal institutions of learning, the nature of discourse, and, of course, “old” business models-all subject to various manifestations of the tension between atoms and bits, as Negroponte termed it in “Being Digital.” Do libraries need survival skills? Yes, but society and culture need survival skills even more, and libraries will survive if they are relevant to this larger task. To navigate successfully the circumstances produced by the amazing explosion of access to unfiltered data and the changing relationship of people to knowledge, the library, with its rich traditions of attention to stewardship, preservation, quality, and providing at least a proxy for the certification of authority, will play an important role in collaboration with its constituencies: end users and content providers.
The next section offers examples of the way in which libraries have participated in interesting collaborations to grapple with the changes brought by the digital era. The particular perspective taken is from the science, technology, engineering, and mathematics (STEM) educational enterprise, with all examples drawn from projects funded under the National Science Foundation’s (NSF) National Science Digital Library Program (NSDL). Two “meta-themes” reflected in this collective set of projects are the integration of research and education missions, and the blurring of formal and informal learning opportunities.
Examples From the NSDL Program
During the mid- to late-1990s, NSF provided leadership and primary funding for the Digital Libraries Initiative-Phase 1 and Phase 2, a multiagency digital library research program. Building on that early work, the NSDL program began (and continues) to support the establishment of a national digital library for science education that constitutes an online network of learning environments and resources for STEM education at all levels, in both formal and informal settings. A key assumption of the program from its inception was that the effort should take a distributed-development approach, reflecting the underlying distributed nature of the Web. From a practical perspective, the decision to adopt a distributed approach also reflected the fact that underlying Web technology was constantly changing and improving, thus the effort should attempt to be as open and flexible as possible without making a single centralized investment that might lead to decisions that would prematurely lock the overall development into a narrow path.
This approach also enabled learners and other end-users to bring their needs more explicitly to the table since one of the advantages of the digital era has been to enable much more participation by end-users of technology in its actual design and deployment. In fact the theme of distributed development has found a natural extension to the project level, in that many NSDL projects have typically featured collaboration among multiple partners representing a number of broad areas: (1) academic, disciplinary expertise typically in the form of faculty leaders of educational innovations; (2) computer science/digital library researchers and information science researchers; (3) traditional library personnel or media specialists (a term increasingly used in the K-12 sector); and (4) more recently, the informal learning sector (e.g., museums and science centers).
The examples that follow illustrate several common ingredients. Foremost is the existence of an interesting problem or challenge whose form in the context of educational digital libraries has an applied nature to it. There is also mutual self-interest on the part of collaborators, a sense that they are engaged in shared problem solving. All parties bring expertise to contribute, and they find value or benefit in what they learn and take away from the effort. The successful collaborations have also developed a genuine sense of collegiality that grows from having a collective sense of purpose. In many ways this is a “meta-feature” that characterizes the way in which the various NSDL projects have worked with one another. Finally, one cannot ignore the role that external funding plays in catalyzing project work that crosses administrative and disciplinary boundaries; while not sufficient it is often necessary. Challenges remain, of course, and the final section of this essay provides commentary on a number of these.
The examples of projects below focus on three themes: (1) metadata standards development with particular application to the alignment of educational resources to national and state science and mathematics standards; (2) integration of digital library resources and frameworks with the infrastructure and processes of the traditional (physical) library; and (3) development and deployment of services. These themes reflect not just areas of interest but also in some sense an evolutionary record of how the digital library field has matured-a natural progression as both underlying technologies and standards have developed. (Award numbers are given with the first two digits reflecting the fiscal year chronology of the cited project.)
Before turning to the examples, it is important to note that none of the NSDL effort has taken place in a vacuum. The larger arena in which all the projects have operated has benefited from and been informed enormously by the advocacy and leadership of the Council on Library and Information Resources (CLIR), the Coalition of Networked Information (CNI), and the Digital Library Federation (DLF), to name but a few organizations. Additionally, much support to the field and leadership has come through projects funded by the Institute for Museum and Library Services (IMLS) and The Andrew W. Mellon Foundation.
Metadata Standards Development and Assignment
Attention in many early NSDL projects centered on the promotion of metadata standards for the description of educational resources. As many a wag has noted, “The great thing about standards is that there are so many to choose from!” Humor aside, early NSDL projects did in fact grow from the work of the Dublin Core effort (an early collaboration of individuals and institutions that married library expertise with computer science expertise) and other standards efforts such as the Learning Object Metadata work of the Institute of Electrical and Electronics Engineers (IEEE). Acknowledging the importance of enhancing interoperability among different digital library approaches, NSDL projects promoted collectively the adoption of at least minimal metadata standards and cross-walking methods. Toward this end, the NSDL program introduced language in its early calls for proposals that strongly urged projects to adhere at a minimum to the Dublin Core metadata standards so as to promote metadata sharing and federation of collections. This step was seen as a minimally necessary condition to ensure that the results of many diverse cataloging efforts could be leveraged to enable search and discovery over a much larger universe of resources than those identified by a single collection. Without such sharing, an individual collection would risk painting itself into an electronic corner of the Web. The introduction of the Open Archives Initiative’s protocols for metadata harvesting (OAI-PMH) also aided this step to raise the standards bar.
Against this broad backdrop of attention to the importance of metadata and in recognition of the labor-intensive nature of human cataloging, a collaboration headed by researchers at the University of Washington’s Information School and university library colleagues (NSF-0121717) began to investigate automated processes to complement human effort. The team also involved the Syracuse University Center for Natural Language Processing and practitioners from Mid-continent Research for Education and Learning (McREL), a nonprofit organization with roots as a U.S. Department of Education regional education laboratory. To automatically assign content standards and other benchmarks to educational resources in the collections of NSDL, the project has developed a natural language processing tool (StandardConnection). The standards and benchmarks come from the McREL Compendium of Standards and Benchmarks and represent both state and national science education standards. Supplementing general descriptive metadata, the content standards metadata make it possible for a teacher in any state to use the NSDL to locate teaching resources for helping students achieve a particular competency set by the state. The overall process involves training the tool on a set of educational resources, cultivating a deep understanding of human cognitive processes involved in manual assignment of content standard metadata tags, iteratively adjusting the tool until reliable tagging is produced, and employing teacher-experts to analyze the quality of the tool’s mappings of resources to standards and benchmarks during an evaluative phase.
Building on this research effort, an implementation project led by Diekema and others at Syracuse (NSF-0435339) has focused on improving the ability of teachers to locate science and mathematics resources that support their standards-based instruction, no matter what state they are in or where a resource was developed. Two services are currently available for NSDL collection providers. The first is a Computer-Assisted Standard Assignment recommender tool that suggests to a human cataloger one to five of the most relevant national content standards appropriate for a learning resource. The cataloger accepts, edits, or rejects these suggestions, and the tool adds them to the resource’s metadata records. The system learns from vetted assignments in order to inform future standards recommendations for increased accuracy. The second service is a methodology and tool that crosswalks between math and science state standards and their national counterparts. The resulting automated mapping between state and national standards allows the national standards to function as an “exchange” standard. NSDL’s search capabilities incorporate this mapping facility so that teachers can search for resources using either their home-state standards or the national standards. Furthermore, educational resources may be easily shared from anywhere in the country once a translation between state standards is facilitated.
A third example in this set involves a collaboration led by library staff at Cornell University (Hillman et al., see NSF-0532854). The team is developing and deploying a metadata registry service to complement the NSDL Data Repository. The registry is based on the open-source Dublin Core Metadata Initiative (DCMI) Registry application and enables multiple diverse collection providers and other NSDL projects to identify, declare, and publish their metadata schemas (element/property sets) and schemes (controlled vocabularies). The project provides support for registration of schemes and schemas for use by human and machine agents, as well as support for the machine mapping of relationships among terms and concepts in those schemes (semantic mappings) and schemas (crosswalks). Generalization of registry software enables implementations beyond centrally controlled metadata schemas, thus placing the distribution of appropriate control and management in the hands of vocabulary creators and maintainers. In turn this offers the potential to overcome economic and legal barriers that have prevented the anticipated growth of registries and distributed registry networks.
Integrating Physical and Digital Traditions
A second area of exploration for NSDL projects has been in how to connect the digital with the physical world. Here collaboration plays an important role not so much with respect to individual implementations that must necessarily reflect local circumstances, but in terms of sharing of experiences that can allow the identification of common principles and best practices.
In the project “Adding Value to the NSDL by Integrating it into Academic Libraries: A Business Proposition and a Service Enhancement” (NSF-0333710), Greenstein and others working across the University of California (UC) system conducted market research to evaluate what content and services the NSDL needs to offer to attract and thus support itself at least in part with subscriptions paid by academic libraries. A second strand of activity developed a prototype service that integrates NSDL into the foundational science collections managed by various libraries within the UC system. The service includes tools that enable libraries to create views of their integrated science collections customized to the needs of different patrons. While mainly a proof-of-concept effort, this aspect of the project promises to inform the modifications that the NSDL and its collection providers may need to make to their technical architectures to enable them to better support integration into academic library collections. The libraries within the larger UC system exhibit highly diverse technical environments and thus have offered an excellent testbed setting for service deployment and evaluation representative of the heterogeneous technical environments that characterize academic libraries in general.
A second project has considered this integration challenge at a single institutional or local level. In “Integrating Digital Libraries and Traditional Libraries: A Model for Sustaining NSDL Collections” (NSF-0333628), Ward led a team at the University of North Carolina at Wilmington (UNC-Wilmington) to investigate the issues involved when integrating an existing NSDL collection, the iLumina digital repository, with a traditional research library, the Randall Library at UNC-Wilmington. Lessons from this project offer guidance for sustaining the many digital collections that reside at institutions of higher education. As part of this effort, the project sought to automate the conversion of Instructional Management Systems (IMS) metadata to MARC data records through an implementation of XML harvester software to transform IMS metadata compiled in the iLumina digital collection directly into MARC data records used in the Randall Library catalog. As iLumina resources are listed within the Randall Library catalog, they become shareable with the OCLC WorldCat database, thus substantially increasing the accessibility of the digital resources originally known only to the local digital repository.
Digital Library Service Frameworks
A third area where NSDL projects have made inroads is in the development of frameworks for service creation and deployment. This area of effort complements the second set of projects described above. For example, in the OCKHAM project (see NSF-0333497), Frumkin at Oregon State, along with collaborators in the University Library at Emory and computer scientists at Virginia Polytechnic Institute, have focused on developing networked middleware to facilitate and expand access to the content and services of the NSDL through the existing national infrastructure of traditional libraries and their service programs. Additionally, the team has created a reference model for integrating the NSDL into traditional library services; evaluating the utility, usage, and impacts of the local library tested services on the participating campus communities through Web log analysis, focus groups, and usability studies; and disseminating results and facilitating growth of the network among an expanding group of institutional partners. By stimulating an extensible framework for networked peer-to-peer interoperation among the NSDL and traditional libraries, this project is also advancing the dialog between librarians and researchers.
Mischo and others at the University of Illinois head a second, more recent project of this type (see NSF-0734992). This team is developing and implementing a set of metasearch gateway services for the distributed NSDL community that use broadcast search technologies to provide access to selected scientific and engineering publisher full-text repositories, abstracting and indexing services, university institutional repositories, open-access full-text journal and report sites, and the efforts of the NSDL Pathways projects. As a component of the NSDL core integration services, the gateways provide custom federated search access to critical distributed information resources that support the instructional and research needs of middle school, high school, undergraduate, and graduate students as well as faculty. Standards-based frameworks are in use such as the NISO MXG (Metasearch XML Gateway) framework, the OpenSearch 1.1 standard, and the Open Archives Initiative protocols for metadata harvesting (OAI-PMH) and for object reuse and exchange (OAI-ORE). Furthermore, the project features a collaboration of information science researchers with personnel from the DLF Aquifer project and an international component involving two Joint Information Systems Committee (JISC)-funded initiatives in the United Kingdom: the PerX project at Heriot-Watt University and the CREE project headquartered at the University of Hull. The latter connections speak to the broader impacts of this project on the global educational digital libraries environment.
A final example illustrates the emergence of utility-like application services. Late in summer 2007, NSDL initiated a collaboration with the Colorado Alliance of Research Libraries (CARL) through which CARL has adopted the NSDL Data Repository and its Fedora-based technology platform to provide distributed collection management for its 11 member institutions. This work is just now under way, and is beginning with the creation, storage, management, and delivery of very large image collections from the member libraries. A key public benefit of the project is that it will enable these resources to be accessible to all school districts across Colorado and Wyoming. This initial effort points the way toward the provision of more extensive repository services for the text, image, and video resources of the alliance. As more libraries and cultural heritage institutions begin to consider digital repositories, this collaboration presents a model for new NSDL partnerships. This example, like the previous two examples in this section, illustrate how libraries can, and perhaps ultimately must, participate in an effort that is beyond what each can take on individually.
As the previous examples show, research libraries have played an important, and often leading, role in projects that have charted new directions in managing data and information in the digital age, and pointed the way toward the development of new digital library services. Such work could not have been undertaken without the involvement of a diverse set of principal investigators, and it has been gratifying to witness the collaboration among units on campuses that previously did not interact much. Indeed, just bringing such groups together has been a notable achievement.
Many challenges remain if the library and scholarly community are to exercise leadership in determining how to leverage the advantages of digital technologies for the benefit of culture and society. Chief among them are the following:
- Engaging the broader library community in implementing leading-edge advances such as those described above and others resulting from programs offered by IMLS and private funders such as the Mellon Foundation. More than just a matter of disseminating information about these advances, this broader engagement will require systematic and systemic effort to help different audiences learn a new language, with faculty needing to understand issues of librarianship and librarians growing to appreciate faculty roles. Of particular interest is the challenge these changing roles will place on the future structure and content of graduate and professional school programs.
- Supporting continued educational-content development and innovation that can be made available through locally maintained digital repositories and shared through a broad network of contributing providers. While producers have primarily been from the higher education community, the rapidly evolving capabilities for reusing and re-forming content are broadening participation in this activity quickly. Key issues will continue to revolve around questions such as authenticity, certification of expertise, and mechanisms and practices for attribution of creation.
- Evaluating the educational impact of the increased access to resources and data that digital libraries make possible; developing metrics to capture the degree of reuse, repurposing, or repackaging of digital material; and assessing the value of such activities.
- Supporting continued research efforts in the management, manipulation, and storage of large heterogeneous data sets; and the development of new tools, methodologies, processes, and services to meet the educational and other scholarly needs of learners.
- Developing increased understanding and satisfaction of end-user needs that move beyond pure searching for factual “data” to more-nuanced, semantically imbued sense making. Here the push toward increased customization must be balanced against privacy concerns.
Perhaps the greatest need is to create and sustain the ability to address the multiple challenges identified above. One possibility is to place responsibility in the hands of a nonprofit organization to provide leadership for the science education and scholarly community to meet these challenges. In this vision, the library would provide a natural voice through which to express an institution’s priorities. And its assets, not only in the form of an institutional repository and services, but more importantly its human resources, would in effect serve as a currency to contribute to the larger national (if not international) organization. Within such a larger organization, the preservation of institutional branding and the companion issue of ownership would present an ongoing challenge. However, the NSDL projects have shown that mutual self-interest and a sense of shared problem solving can lead to significant collaborations among different units on campuses. Furthermore, as the examples illustrate, interinstitutional collaborations have formed naturally, transcending institutional identity. Indeed, the community of NSDL projects has self-organized into multiple standing committees and workgroups to tackle collectively numerous tasks including policy development in areas such as: collection development, privacy, copyright, metadata standards and guidelines, and metadata sharing. In addition, collaboratively developed services such as those described above have been created.2
How might such a virtual organization come into being? One model finds its inspiration in the creation of NSFnet, which is currently celebrating its 20th anniversary. Specifically, it is fitting to envision an analog to the NSFnet “connectivity” program in which educational institutions would receive initial support to join a (virtual) organization as a member institution for several years. However, grants would not be for physical connectivity, but rather
- to build capacity to make locally developed educational resources and services-institutional repositories-available to a wider audience via the NSDL (gaining access in return to the larger collective body of resources and services), and
- to support local teacher/faculty development activities to engage educators in how to make use of the new capabilities of NSDL and the resources to which it provides access.
Continued membership would fall on the institution. A relatively modest annual fee, multiplied across interested institutions of higher education and local school districts, would generate a significant source of self-sustaining revenue.3 As the network effect took hold-with the value of the network increasing as more members join-such a strategy would enable NSF and other funders to transition support for this facility to a community-based mechanism.
1 The views expressed in this essay are entirely those of the author and do not represent official policy of the National Science Foundation.
2 For more details, see http://nsdl.org/resources_for/library_builders/nsdlgroups.php, and a related link at http://nsdl.org/resources_for/library_builders/tools.php?pager=tools.
3 There are about 4,000 higher education institutions and about 16,000 local school districts in the United States. An average $10,000/year fee would permit a $200M/year operating budget. The annual fee could be scaled to reflect attributes such as institution size, population, and other socioeconomic factors. The fee could be thought of as an ongoing subscription (see http://www.dlib.org/dlib/march01/zia/03zia.html and the section on Sustainability). Museums and public libraries would also be able to subscribe.