Council on Library and Information Resources

Username (email)

Password

Introduction

[ contents ]    [ previous ]    [ next ]

This literature survey was created in support of the Stanford Linked Data Workshop, held 27 June–1 July 2011, at Stanford University. Details of the work and products of that event can be seen in the workshop’s final report.

Style

This survey was planned and built as a snapshot drawn from email lists, blog posts, conference proceedings, project proposals and reports, and similarly informal but timely grey publication sources. The number of citations for formal publications can be counted on one hand. Stylistically, it mirrors the resources it summarizes.

To help the reader distinguish between the material I quote and my own writing, I have used two character sets:

  • Quoted material is denoted by a serif font that looks like this.
  • My notes appear in the sans-serif font you see in this introduction.

     

    Definitions

    A wide range of definitions can be found for linked data and related aspects of web technology, e.g., semantic web, RDF, ontologies, triples, graphs, microdata, URIs, and OWL. One view of this cluster of terminologies and technologies can be had in the technical treatments found in relevant Wikipedia articles. Taking RDF as an example, we find this definition:

    The RDF data model is similar to classic conceptual modeling approaches such as Entity-Relationship or Class diagrams, as it is based upon the idea of making statements about resources (in particular Web resources) in the form of subject-predicate-object expressions. These expressions are known as triples in RDF terminology. The subject denotes the resource, and the predicate denotes traits or aspects of the resource and expresses a relationship between the subject and the object. For example, one way to represent the notion "The sky has the color blue" in RDF is as the triple: a subject denoting "the sky", a predicate denoting "has the color", and an object denoting "blue." RDF is an abstract model with several serialization formats (i.e., file formats), and so the particular way in which a resource or triple is encoded varies from format to format.

    Other views (or more accurately, other viewpoints) offer very different pictures of what’s going on in this field. These include a disparate range of passionately held ideas about what is and is not acceptable practice. Scan the Landscape section of the survey for a quick introduction to the breadth of capabilities, tools, expectations, and approaches encompassed by the seemingly innocuous term linked data. Among the thinking found there, Dan Brickley (of SKOS fame) offers a refreshingly practical set of observations:

    RDF enthusiasts share 99.9% of their geek DNA with the microformats community, with XML experts, with OWL people, ... but time and again end up nitpicking on embarrassing details. Someone "isn't really" publishing Linked Data because their RDF doesn't have enough URIs in it, or they use unfashionable URI schemes. Or their Apache Web server isn't sending 303 redirects. Or they've used a plain XML language or other standard instead. This kind of partisan hectoring can shrink a community passionate about sharing data in the Web, just at a time when this effort should be growing more inclusive and taking a broader view of what we're trying to achieve.

    The formats and protocols are a detail. They'll evolve over time. If people do stuff that doesn't work, they'll find out and do other things instead. The thing that keeps me involved is the common passion for sharing information in the Web.

    A recent addition to the landscape of definitions for linked data and related components is the Wcx3’s [draft] report from its Library Linked Data Incubator Group:

    Linked Data. "Linked Data" refers to data published in accordance with principles designed to facilitate linkages among datasets, element sets, and value vocabularies. Linked Data uses Uniform Resource Identifiers (URIs) as globally unique identifiers for any kind of resource—analogously to how identifiers are used for authority control in traditional librarianship. In Linked Data, URIs may be Internationalized Resource Identifiers (IRIs) -- Web addresses that use the extended set of natural-language scripts supported by Unicode. Linked Data is expressed using standards such as Resource Description Framework (RDF), which specifies relationships between things—relationships that can be used for navigating between, or integrating, information from multiple sources.

    Open Data. While "Linked Data" refers to the technical interoperability of data, "Open Data" focuses on its legal interoperability. According to the definition for Open Bibliographic Data, Open Data is in essence freely usable, reusable, and redistributable—subject, at most, to the requirements to attribute and share alike. Note that Linked Data technology per se does not require data to be Open, though the potential of the technology is best realized when data is published as Linked Open Data.

    For those who prefer definitions that rely on examples, the British Library (BL) published its linked data model this summer. For the Talis announcement and access to a PDF representation of the model, consult Richard Wallis’ blog post Significant bibliographic linked data release from the British Library. Tim Hodson provides an introduction to the model here, and information about later updates here.

    Beyond that, one might take advantage of the in-depth scan of the linked-data community’s efforts that was presented this summer at the BL-hosted Linked data and libraries 2011.   Among the presentations found on the website is British Library Chief Executive Lynne Brindley’s take on linked data. (It is the first presentation on the morning’s video beginning at 4:45 minutes [no slides]). For an up-to-date, wide-ranging scan of linked data as it relates to libraries, take in Richard Wallis’ Linked data applicable to libraries (on the morning’s video beginning at 23:00 minutes).

    And finally, for the full-on, more or less “official” treatment of linked data and its related technologies, see Linked data, evolving the web into a global data space (Tom Heath and Christian Bizer, 2011).

     

    Scope

    The survey was shaped by the objectives of the Stanford Linked Data Workshop, thus it focuses on practical aspects of understanding and applying linked data practices and technologies to the metadata and content of libraries, museums, and archives. A fuller statement of those objectives is found here.

    For more detail about the boundaries for coverage of various aspects of linked data by the survey, see its in-scope and out-of-scope sections.

     

    Author’s note

    The survey is offered as a mid-2011 snapshot of the environs, players, and projects associated with linked data. It was created to support the efforts of those attending the Stanford Workshop. Both the workshop and the survey were made possible by a grant from The Andrew W. Mellon Foundation. The survey was carried out under contract with CLIR (Council on Library and Information Resources), while making use of the Council’s exceptional and always helpful editorial and technical staff.

    The survey is published here in concert with the workshop’s final report in the hope that others will find both useful in helping them wend their way through the complexities and vagaries associated with the rapid evolution of linked data and its associated technologies.

    Jerry Persons

    Knowledge Motifs LLC

    jpersons@kmotifs.com

    www.kmotifs.com

    [ contents ]    [ previous ]    [ next ]