What ... in scope • CLIR

… what’s viable, and what can stay on the shelf?

Included here are references to thought pieces that outline an environmental framework for the Stanford Workshop. Far from being cast in concrete, these musings and papers highlight those facets of the linked-data/semantic-web ecology that lie both inside and outside the intended focus of the week’s work in June and July.

In-scope for the workshop

1. Be part of the web, not just on it

a. Andy Powell is a long-time advocate of leveraging the web’s body of practice and technologies as the most effective means of making academic content and its associated metadata widely available and useful. In hisRepositories thru the looking glass (2008) post, he writes:

It strikes me that repositories are of interest not just to those librarians in the academic sector who have direct responsibility for the development and delivery of repository services. Rather they represent a microcosm of the wider library landscape-a useful case study in the way the Web is evolving, particularly as manifest through Web 2.0 and social networking, and what impact those changes have on the future of libraries, their spaces and their services.

… the ‘service oriented’ approaches that we have tended to adopt in standards like the OAI-PMH, SRW/SRU, and OpenURL sit uncomfortably with the ‘resource oriented’ approach of the Web architecture and the Semantic Web. We need to recognize the importance of REST as an architectural style and adopt a ‘resource oriented’ approach at the technical level when building services.

b. Ed Summers provides a 2011summary of thinking and developments related to linked data being of the web (REST as an architectural style, resource oriented approaches, etc.). His comments refer to an eFoundationspost (Andy Powell and Pete Johnson) about metadata guidelines for the UK Resource Discovery Task Force (more about the RDTF here and here):

As I’ve heard you argue persuasively in the past, the success of the WWW as a platform for delivery of information is hard to argue with. One of the things that the WWW did right (from the beginning) was focus the technology on people actually doing stuff…in their browsers. It seems really important to make sure whatever this metadata is, that users of the Web will see it (somehow) and will be able to use it. Ian Davis’ points inIs the Semantic Web Destined to be a Shadow? are still very relevant today I think.

Aligning with the web is a good goal to have. Relatively recent service offerings from Google andFacebook indicate their increased awareness of the utility of metadata to their users. And publishers are recognizing how important they are for getting their stuff before more eyes. It’s a kind of virtuous cycle I hope.

This must feel like it has been a long time in coming for you and Pete. Google’s approach encourages a few different mechanisms:RDFa,Microdata andMicroformats. Similarly, Google Scholar parses a handful of metadata vocabularies present in the HTML head element. The web is a big place to align with I guess.

I imagine there will be hurdles to get over, but I wonder if your task force could tap into this virtuous cycle. For example, it would be great if cultural heritage data could be aggregated using techniques that big search companies also use: e.g. RDFa, microformats and microdata; and sitemaps and Atom for updates. This would assume a couple things: publishers could allow (and support) crawling, and that it would be possible to build aggregator services to do the crawling. An important step would be releasing the aggregated content in an open way too. This seems to be an approach that is very similar to what I’ve heard Europeana is doing…which may be something else to align with.

c. Peter Murray adds thisnote about constraints embedded in the mental model that evolved alongside library metadata practice and technologies:

What trips up our [library] community even more, I think, is that we have a tendency to equate this communications format [MARC] with a mental model of how we describe things from a bibliographic point of view. We think of discrete records that describe these things rather than a network (or, more accurately, a graph) of interrelated nodes. This forces us to focus on the textual content of fields and not on the relationships between things. And in doing so, we are not making the best use of our limited efforts to describe the things in our curatorial care.

d. Brian O’Learyposts about an interview with Toby Green, head of publishing for the Organization for Economic Co-operation (OECD), a worldwide publisher in economics and public policy. Green’s take on access to information includes this:

At heart, most people don’t care if it’s a book or a periodical, whether it’s online or offline. As Toby puts it, “People are looking for answers, not books or data or papers. That’s why we bundle all OECD knowledge, in book form, journals, data sets, tables, working papers, you name it, into a single, seamless, online platform”.

e. Kevin Kelly provides another take on being of the web back in 2008 at O’Reilly’s WEB 2.0 Summit. In a 15-minutesession, he reviewed the history of the “web” (some 6,000+ days old at that point) with an eye to projecting what characteristics WWW might exhibit at an equal distance into the future. The future he postulated is one based on an all-pervasive, all-consuming web of data that is a worldwide … a web in which:

If you’re producing some information and it’s not webized, not in some way online, and related and shared to everything else, it doesn’t count.

What we’ve learned from the first web is that we have … to believe in the impossible. It was impossible what’s happened in only 6,000 days. If what I’m talking about [for the next 6,000 days] sounds impossible, you have to believe it because that’s what we’ve learned.

f. Jon Udell offers his definition of what being of the web is all about in hisSeven ways to think about the web (January 2011):

Back in 2000, the patterns, principles, and best practices for building web information systems were mostly anecdotal and folkloric. Roy Fielding’sdissertation on the web’s deep architecture provided a formal definition that we’ve been digesting ever since. In his introduction he wrote that the web is “an Internet-scale distributed hypermedia system” that aims to “interconnect information networks across organizational boundaries.” His thesis helped us recognize and apply such principles as universal naming, linking, loose coupling, and disciplined resource design. These are not only engineering concerns. Nowadays they matter to everyone. Why? Because the web is a hybrid information system co-created by people and machines. Sometimes computers publish our data for us, and sometimes we publish it directly. Sometimes machines subscribe to what machines and people publish, sometimes people do.

Given the web’s hybrid nature, how to can we teach people to make best use of this distributed hypermedia system? That’s what I’ve been trying to do, in one way or another, for many years. It’s been a challenge to label and describe the principles I want people to learn and apply. I’ve used the termscomputational thinking,Fourth R principles, and, most recently, Mark Surman’s evocativethinking like the web.

Back in October, at the Traction Software users’ conference, I led a discussion on the theme ofobservable work in which we brainstormed a list of some principles that people apply when they work well together online. It’s the same list that emerges when I talk about computational thinking, or Fourth R principles, or thinking like the web. Here’s an edited version of the list we put up on the easel that day:

Be the authoritative source for your own data
Pass by reference not by value
Know the difference between structured and unstructured data
Create and adopt disciplined naming conventions
Push your data to the widest appropriate scope
Participate in pub/sub networks as both a publisher and a subscriber
Reuse components and services

He goes on to define each these points and provide brief comments on the effects that employing each principle might have on interactions with the web.

2. Meaning and doing what we say
… or… what, after all, do we mean by linked data

a. Mike Bergman (UMBEL,Structured Dynamics,Sweet Tools) posted his thoughts on the just completed SemTech 2010 conference in a post entitledI Have Yet to Metadata I Didn’t Like:

At theSemTech conference earlier this summer there was a kind of vuvuzela-like buzzing in the background. And, like the World Cup games on television, in play at the same time as the conference, I found the droning to be just as irritating.

That droning was a combination of the sense of righteousness in the superiority oflinked data matched with a reprise of the chicken-and-egg argument that plagued the early years of Semantic Web advocacy[1]. I think both of these premises are misplaced. So, while I have been a fan and explicator of linked data for some time, I do not worship at its altar[2]. And, for those that do, this post argues for a greater sense of ecumenism.

My main points are not against linked data. I think it a very useful technique and good (if not best) practice in many circumstances. But my main points get at whether linked data is an objective in itself. By making it such, I argue our eye misses the ball. And, in so doing, we miss making the connection with meaningful, interoperable information, which should be our true objective. We need to look elsewhere than linked data for root causes.

His commentary continues with an ordered set of observations:

What problem are we solving
The problem is not a set of consumable data
An interoperable data model does not require a single transmittal format
A technique [linked data] cannot carry the burden of usefulness or interoperability
50% of linked data is missing…that is the linking part [ i.e., context and coherence]
Pluralism is a reality; embrace it

And he sums up with:

Parochialism and root cause analysis

Linked data is a good thing, but not an ultimate thing. By making linked data an objective in itself we unduly raise publishing thresholds; we set our sights below the real problem to be solved; and we risk diluting the understanding of RDF from its natural role as a flexible and adaptive data model. Paradoxically, too much parochial insistence on linked data may undercut its adoption and the realization of the overall semantic objective.

Root cause analysis for what it takes to achieve meaningful, interoperable information suggests that describing source content in terms of what it is about is the pivotal factor. Moreover, those contexts should be shared to aid interoperability. Whichever organizations do an excellent job of providing context and coherent linkages will be the go-to ones for data consumers. As we have seen to date, merely publishing linked data triples does not meet this test.

I have heardsome state that first you celebrate linked data and its growing quantity, and then hope that the quality improves. This sentiment holds if indeed the community moves on to the questions of quality and relevance. The time for that transition is now. And, oh, by the way, as long as we are broadening our horizons, let’s also celebrate properly characterized structured data no matter what its form. Pluralism is part of the tao to the meaning of information.

b. Mike Ellis (Managing and growing a cultural heritage web presence) offers his take on the state of affairs in his postLinked Data: my challenge:

I’ve been sitting on the sidelines sniping gently at Linked Data since it apparently replaced the Semantic Web as The Next Big Thing. I remained cynical about the SW all the way through, and as of right now I remain cynical about Linked Data as well.

This might seem odd from someone obsessed with-and a clear advocate of-opening up data. I’veblogged about,talked about andwritten papers about what I’ve come to call MRD (Machine Readable Data). I’ve gone so far as to believe that if it doesn’t have an API, it doesn’t-or shouldn’t-exist.

So what is my problem with Linked Data? Surely what Linked Data offers is the holy grail of MRD? Shouldn’t I be embracing it as everyone else appears to be?

Yes. I probably should.

But…Linked Data runs headlong into one of the things I also blog about all the time here, and the thing I believe in probably more than anything else: simplicity.

If there is one thing I think we should all have learned from RSS, simple API’s, YQL, Yahoo Pipes, Google Docs, etc., it is this: for a technology to gain traction it has to be not only accessible, but simple and usable, too.

Here’s how I see Linked Data as of right now:

It is completely entrenched in a community who are deeply technically focused. They’re nice people, but I’ve had a good bunch of conversations and never once has anyone been able to articulate for me the why or the how of Linked Data, and why it is better than focusing on simple MRD approaches, and in that lack of understanding we have a problem. I’m not the sharpest tool, but I’m not stupid either, and I’ve been trying to understand for a fair amount of time…

There are very few (read: almost zero) compelling use-cases for Linked Data. And I don’t mean the TBL “hey, imagine if you could do X” scenario, I mean real use-cases. Things that people have actually built. And no,Twine doesn’t cut it.

The entry cost is high-deeply arcane and overly technical, whilst the value remains low. Find me something you can do with Linked Data that you can’t do with an API. If the value was way higher, the cost wouldn’t matter so much. But right now, what do you get if you publish Linked Data? And what do you get if you consume it?

c. Dan Chudnov adds his insightful, long-time perspective on changes and advances in library information services in a note introducing hisBetter living through linking presentation (2009).

This was the first time I spoke about this in a room not entirely filled with hackers, though, so I couldn’t just start talking about conneg and RDF models. It needed more context. As far as I can tell, the context that matters most is that we’ve been building a web for fifteen years, now, and we’ve continually changed how we build the web as we’ve changed how we use the web. So I spent most of the talk stressing how adhering tothe four rules of Linked Data can help us make our libraries’ stuff more relevant, more connected, and more likely to be found and used by improving how we link things together.

Do flip through hisslides. If products of the Stanford Workshop can in fact foster the capabilities he proposes here, it will be counted a thorough success.

[previous] [next]

What … in scope

… what’s viable, and what can stay on the shelf?

In-scope for the workshop

1. Be part of the web, not just on it

2. Meaning and doing what we say … or… what, after all, do we mean by linked data

2. Meaning and doing what we say
… or… what, after all, do we mean by linked data