Introduction • CLIR

[ contents ] [ previous ] [ next ]

This literature survey was created in support of the Stanford Linked Data Workshop, held 27 June–1 July 2011, at Stanford University. Details of the work and products of that event can be seen in the workshop’s final report.

Style

This survey was planned and built as a snapshot drawn from email lists, blog posts, conference proceedings, project proposals and reports, and similarly informal but timely grey publication sources. The number of citations for formal publications can be counted on one hand. Stylistically, it mirrors the resources it summarizes.

To help the reader distinguish between the material I quote and my own writing, I have used two character sets:

Quoted material is denoted by a serif font that looks like this.
My notes appear in the sans-serif font you see in this introduction.

Definitions

A wide range of definitions can be found for linked data and related aspects of web technology, e.g., semantic web, RDF, ontologies, triples, graphs, microdata, URIs, and OWL. One view of this cluster of terminologies and technologies can be had in the technical treatments found in relevant Wikipedia articles. Taking RDF as an example, we find this definition:

The RDF data model is similar to classic conceptual modeling approaches such asEntity-Relationship orClass diagrams, as it is based upon the idea of makingstatements about resources (in particularWeb resources) in the form of subject-predicate-object expressions. These expressions are known as triples in RDF terminology. The subject denotes the resource, and the predicate denotes traits or aspects of the resource and expresses a relationship between the subject and the object. For example, one way to represent the notion “The sky has the color blue” in RDF is as the triple: asubject denoting “the sky”, apredicate denoting “has the color”, and anobject denoting “blue.” RDF is an abstract model with severalserialization formats (i.e., file formats), and so the particular way in which a resource or triple is encoded varies from format to format.

Other views (or more accurately, other viewpoints) offer very different pictures of what’s going on in this field. These include a disparate range of passionately held ideas about what is and is not acceptable practice. Scan theLandscape section of the survey for a quick introduction to the breadth of capabilities, tools, expectations, and approaches encompassed by the seemingly innocuous term linked data. Among the thinking found there, Dan Brickley (of SKOS fame) offers a refreshingly practical set of observations:

RDF enthusiasts share 99.9% of their geek DNA with the microformats community, with XML experts, with OWL people, … but time and again end up nitpicking on embarrassing details. Someone “isn’t really” publishing Linked Data because their RDF doesn’t have enough URIs in it, or they use unfashionable URI schemes. Or their Apache Web server isn’t sending 303 redirects. Or they’ve used a plain XML language or other standard instead. This kind of partisan hectoring can shrink a community passionate about sharing data in the Web, just at a time when this effort should be growing more inclusive and taking a broader view of what we’re trying to achieve.

The formats and protocols are a detail. They’ll evolve over time. If people do stuff that doesn’t work, they’ll find out and do other things instead. The thing that keeps me involved is the common passion for sharing information in the Web.

A recent addition to the landscape of definitions for linked data and related components is the Wcx3’s [draft] report from its Library Linked Data Incubator Group:

Linked Data. “Linked Data” refers to data published in accordance withprinciples designed to facilitate linkages among datasets, element sets, and value vocabularies. Linked Data usesUniform Resource Identifiers (URIs) as globally unique identifiers for any kind of resource-analogously to how identifiers are used for authority control in traditional librarianship. In Linked Data, URIs may beInternationalized Resource Identifiers (IRIs) —Web addresses that use the extended set of natural-language scripts supported byUnicode. Linked Data is expressed using standards such asResource Description Framework (RDF), which specifies relationships between things-relationships that can be used for navigating between, or integrating, information from multiple sources.

Open Data. While “Linked Data” refers to the technical interoperability of data, “Open Data” focuses on its legal interoperability. According to the definition forOpen Bibliographic Data, Open Data is in essence freely usable, reusable, and redistributable-subject, at most, to the requirements to attribute and share alike. Note that Linked Data technology per se does not require data to be Open, though the potential of the technology is best realized when data is published as Linked Open Data.

For those who prefer definitions that rely on examples, the British Library (BL) published its linked data model this summer. For the Talis announcement and access to a PDF representation of the model, consult Richard Wallis’ blog postSignificant bibliographic linked data release from the British Library. Tim Hodson provides an introduction to the modelhere, and information about later updateshere.

Beyond that, one might take advantage of the in-depth scan of the linked-data community’s efforts that was presented this summer at the BL-hostedLinked data and libraries 2011. Among the presentations found on the website is British Library Chief Executive Lynne Brindley’s take on linked data. (It is the first presentation on the morning’svideo beginning at 4:45 minutes [no slides]). For an up-to-date, wide-ranging scan of linked data as it relates to libraries, take in Richard Wallis’Linked data applicable to libraries (on the morning’svideo beginning at 23:00 minutes).

And finally, for the full-on, more or less “official” treatment of linked data and its related technologies, seeLinked data, evolving the web into a global data space (Tom Heath and Christian Bizer, 2011).

Scope

The survey was shaped by the objectives of the Stanford Linked Data Workshop, thus it focuses on practical aspects of understanding and applying linked data practices and technologies to the metadata and content of libraries, museums, and archives. A fuller statement of those objectives is foundhere.

For more detail about the boundaries for coverage of various aspects of linked data by the survey, see its in-scope and out-of-scope sections.

Author’s note

The survey is offered as a mid-2011 snapshot of the environs, players, and projects associated with linked data. It was created to support the efforts of those attending the Stanford Workshop. Both the workshop and the survey were made possible by a grant from The Andrew W. Mellon Foundation. The survey was carried out under contract with CLIR (Council on Library and Information Resources), while making use of the Council’s exceptional and always helpful editorial and technical staff.

The survey is published here in concert with the workshop’s final report in the hope that others will find both useful in helping them wend their way through the complexities and vagaries associated with the rapid evolution of linked data and its associated technologies.

Jerry Persons

Knowledge Motifs LLC

jpersons@kmotifs.com