Challenges & Opportunities

[ previous]   [ next]

A few of the linked-data “hot topics” were addressed in the section on what’s in-scope and what’s out-of-scope for the workshop.

Also, remember that the current bible of how-to guidance for linked data can be had in Heath’s and  Bizer’s Linked Data: Evolving the Web into a Global Data Space.

1. Promises, promises

a. Mike Bergman Linked data needs to live up to the hype, and soon
from an interview by Jennifer Zaino (Semanticweb.com)

“I think the concern right now is that the Linked Data that is out there is not being used, except in isolated pockets that are curated,” says Mike Bergman, CEO of Structured Dynamics. “With all the hype that is coming down, the quality is poor, context is lacking, and use is not evident. So do we risk a backlash by hyping something that no one is really using? I think we’re getting close to that point.”
[snip]

“The reference vocabularies provide fixed reference points, to give you a sense of orientation, of context. They are the fixed points by which you navigate”, Bergman says. Structured Dynamics and Ontotext have approached the Dublin Core Metadata Initiative to take the lead in driving the discussion about what constitutes a good reference vocabulary for linking purposes, and to play an active role in putting up public repositories of such vocabularies.
[snip]

Equally problematic, however, is the relative dearth of off-the-shelf predicates to use for making connections between the nodes, when the relationships between two different data sets are approximate rather than exact.

“So we have a semantic gap-we need reference vocabularies to keep people oriented in the right direction, and we need linking predicates or verbs or relationships between these data sets that are more approximate vs. exact, Bergman says. Exact would be great if that were true. But most real relationships are not exact. and when you say they are and they aren’t, you introduce errors into the whole Linked Data structure.”

2. Opportunities

a. Leigh Dodds Challenges and opportunities for linked data (Talis, Nodalities blog)

Yesterday I gave a short talk at Online Information 2010 titled “Challenges and Opportunities for Linked Data” ( abstract). The presentation highlighted what I saw as the main challenges that face us as we grow the web of data, and highlighted some opportunities for organisations that want to get involved.
[excerpts, as follow]

Craft
The first of these relates to what I’d call “the craft” of Linked Data. To date the growth of the Linked Data cloud has largely been driven by skilled artisans-from academia and a small number of commercial organisations-who know how to work with the technology, how to use and manipulate the data that is already available, and how to get things online and linked together in a way that achieves the 5 star approach.

Fuelling applications
Linked Data isn’t being used as much as it could or should be. Why is this?

I think there are two reasons. The first relates to my previous point about enabling the “journeyman” developer. Right now it takes a certain amount of skill to get the most from Linked Data and SPARQL. This presents a road-block for developers who may be interested in using some of the available data. It may even stop them looking at all.

A potentially larger issue is that much of the data available as Linked Data is either static, irregularly updated, or already available in other more accessible formats and APIs. This isn’t true across the cloud as a whole, but timeliness is an issue in many areas. It’s a consequence of the early boot-strapping process which emphasised conversions of available data dumps, and the wrapping of existing APIs and services. As a boot-strapping process that has been fantastic. But it’s not driving engagement: why use data if you can get it somewhere else easier, and in a more up to date form, using tools that you’re already familiar with?

Sustainability
The third challenge I highlighted was sustainability. It’s easy to look at the Linked Data diagram and think: “Well, those bits are done, all we need to do is look [at] how to grow the diagram. We just need to add more data.” I think that’s a natural but unfortunately misleading viewpoint: we need to look carefully at our foundations.

Not all of these sources are on infrastructure that could support real, high volume usage. And few of the datasets are clearly licensed. I’ve personally encountered a number of occasions where some significant datasets are offline or unavailable. So we need to be realistic about whether people can build a stable, commercial application against the web of data as it exists today.

Become a hub
One of the interesting properties of the Linked Data cloud diagram is how it clearly illustrates the emergence of a number of hubs-like dbpedia-that form the focal points for links from a number of different datasets. If you look closely you can also see that there are emerging hubs within specific subject domains.

I wonder whether the hubs that we see today will continue to play such a key role as the web of data evolves? My feeling is that in a few years time the picture and connectivity is going to be quite different. Particularly if we continue to see engagement from government and other sectors.

There is clearly an opportunity here for organisations who are already key enablers within a

particular sector to become a linking hub on the web of data

Turn identifiers into channels
Linked Data requires you to assign URLs to identify things: people, places, events, whatever. Generally we tend to focus on how that is an important step to publishing data: concentrating on the mechanics of what makes a good, stable identifier and highlighting how this becomes a key way for other people to find your data.

What this misses is that those identifiers can also become channels, or hooks, for your organisation to find other people’s data. Once you have published Linked Data and it becomes linked to by other datasets, all of that external data annotates and enriches your own, providing valuable and useful context. Linking data creates network effects, and everyone in the network benefits. That includes you.

The external data is easily accessible through link discovery so it becomes much easier to find, aggregate and analyse for a variety of purposes. That might be to drive new product features, or to simply power business intelligence and analysis within the enterprise.

I tend to think of it as being able to fish the web of data for useful context. Your URIs are the hooks. Your data is the bait.

Data as a service
It’s been said before but its worth repeating: Linked Data isn’t necessarily Open Data. The technology is not at odds with exploring business models around data services or access.

The “Data as a Service” (DaaS) idea is gaining momentum in a number of different areas with an increasing number of commercial APIs coming online. We should also soon be seeing commercially available services directly powered by open data sources or through mining those sources.

b. Other resources:

3. What’s hard about linked data

a. Rob Styles What people find hard about linked data (Talis)

Following on from the post I put up last talking about Linked Data training, I got asked what people find hard when learning about Linked Data for the first time. Delivering our training has given us a unique insight into that, across different roles, backgrounds and organisations-in several countries. We’ve taught hundreds of people in all.

It’s definitely true that people find Linked Data hard, but the learning curve is not really steep compared with other technologies. The main problem is there are a few steps along the way, certain things you have to grasp to be successful with this stuff.
[excerpts, as follow]

CONCEPTUAL
Graph thinking
The biggest conceptual problem learners seem to have is with what we call graph thinking. What I mean by graph thinking is the ability to think about data as a graph, a web, a network. We talk about it in the training material in terms of graphs, and start by explaining what a graph is (and that it’s not a chart!).

Non-programmers seem to struggle with this, not with understanding the concept, but with putting themselves above the data. It seems to me that most non-programmers we train find it very easy to think about the data from one point of view or another, but find it hard to think about the data in less specific use-cases.

Using URIs to name real things
In Linked Data we use URIs to name things, not just address documents, but as names to identify things that aren’t on the web, like people, places, concepts. When coming across Linked Data, knowing how to do this is another step people have to climb.

Non-Constraining Nature (Open World Assumption)
Linked Data follows the open-world assumption-that something you don’t know may be said elsewhere. This is a sea-change for all developers and for most people working with data.

PRACTICAL
HTTP, 303s and Supporting Custom URIs
Certainly for most data owners, curators, admins this stuff is an entirely different world, and a world one could argue they shouldn’t need to know about. With Linked Data, URI design comes into the domain of the data manager where historically it’s always been the domain of the web developer.

Snytax
This is a tricky one. I nearly put this into the conceptual issues as part of the learning curve is grasping that RDF has multiple syntaxes and that they are equal. However, most people get that quite quickly, even if they do have problems with the implications of that.

Practically, though, people have quite a step with our two most prominent syntaxes- RDF/XML and Turtle. The specifics are slightly different for each, but the essence is common: identifying the statements.

SUMMARY
None of the steps above are actually hard; taken individually they are all easy to understand and work through-especially with the help of someone who already knows what they’re doing. But, taken together, they add up to a perception that Linked Data is complex, esoteric and different to simply building a website and it is that (false) perception that we need to do more to address.

b. Hugh Glaser Can we lower the LD entry cost please? (2009 email on the public-lod list)
My proposal:

We should not permit any site to be a member of the Linked Data cloud if it does not provide a simple way of finding URIs from natural language identifiers.*

Rationale:
One aspect of our Linking Data (not to mention our Linking Open Data) world is that we want people to link to our data-that is, I have published some stuff about something, with a URI, and I want people to be able to use that URI.

So my question to you, the publisher, is: “How easy is it for me to find the URI your users want?”

My experience suggests it is not always very easy. What is required at the minimum, I suggest, is a text search, so that if I have a (boring string version of a) name that refers in my mind to something, I can hope to find an (exciting Linked Data) URI of that thing. I call this a projection from the Web to the Semantic Web.  rdfs:label or equivalent usually provides the other one. [i.e., a second way to find desired URIs]

At the risk of being seen as critical of the amazing efforts of all my colleagues (if not also myself), this [i.e., finding desired URIs] is rarely an easy thing to do.

Some recent experiences:

  • OpenCalais: as in my previous message on this list, I tried hard to find a URI for Tim, but failed.
  • dbtune: Saw a Twine message about dbtune, trundled over there, and tried to find a URI for a Telemann, but failed.
  • dbpedia: wanted Tim again. After clicking on a few web pages, none of which seemed to provide a search facility, I resorted to my usual method: look it up in wikipedia and then hack the URI and hope it works in dbpedia.
  • wordnet: [2] below

(Sorry to name specific sites, guys, but I needed a few examples. And I am only asking for a little more, so that the fruits of your amazing labours can be more widely appreciated!)

So I have access to Linked Data sites that I know (or at least strongly suspect) have URIs I might want, but I can’t find them.

How on earth do we expect your average punter to join this world?

What have I missed?

Searching, such as Sindice: Well yes, but should I really have to go off to a search engine to find a dbpedia URI? And when I look up “Telemann dbtune” I don’t get any results. And I wanted the dbtune link, not some other link. Did I miss some links on web pages? Quite probably, but the basic problem still stands.

SPARQL: Well, yes. But we cannot seriously expect our users to formulate a SPARQL query simply to find out the dbpedia URI for Tim. What is the regexp I need to put in? (see below [1])

A foaf file: Well Tim’s dbpedia URI is probably in his foaf file (although possibly there are none of Tim’s URIs in his foaf file), if I can actually find the file; but for some reason I can’t seem to find Telemann’s foaf file.

If you are still doubting me, try finding a URI for Telemann in dbpedia without using an external link, just by following stuff from the home page. I managed to get a Telemann by using SPARQL without a regexp (it times out on any regexp), but unfortunately I get the asteroid.

Again, my proposal:
*We should not permit any site to be a member of the Linked Data cloud if it does not provide a simple way of finding URIs from natural language identifiers.*

Otherwise we end up in a silo, and the world passes us by.

Very best

[And since we have to take our own medicine, I have added a “Just search”

box right at the top level of all the rkbexplorer.com domains, such as

http://wordnet.rkbexplorer.com/ ]

[1] Dbtune finding of Telemann:
SELECT * WHERE {?s ?p ?name .
FILTER regex(?name, “Telemann$”) }

I tried
SELECT * WHERE {?s ?p ?name .
FILTER regex(?name, “telemann$”, “i”) }
first, but got no results – not sure why.

[2] <rant>

I cannot believe just how frustrating this stuff can be when you really try to use it.  Because I looked at Sindice for telemann, I know that it is a word in wordnet ( http://sindice.com/search?q=Telemann reports loads of http://wordnet.rkbexplorer.com/ links).

Great, he thinks, I can get a wordnet link from a “proper” wordnet publisher (ie not me).

Goes to http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData

to find wordnet.

The link there is dead.

Strips off the last bit, to get to the home princeton wordnet page, and clicks on the browser link I find – also dead.

Go back and look on the http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets page, and find the link to http://esw.w3.org/topic/WordNet , but that doesn’t help.

So finally, I do the obvious – google “wordnet rdf”.

Of course I get lots of pages saying how available it is, and how exciting it is that we have it, and how it was produced; and somewhere in there I find a link: “Wordnet-RDF/RDDL Browser” at  www.openhealth.org/RDDL/wnbrowse.

Almost unable to contain myself with excitement, I click on the link to find a text box, and with trembling hands I type “Telemann” and click submit.

If I show you what I got, you can come some way to imagining my devastation:

“Using org.apache.xerces.parsers.SAXParser

Exception net.sf.saxon.trans.DynamicError: org.xml.sax.SAXParseException:

White spaces are required between publicId and systemId.

org.xml.sax.SAXParseException: White spaces are required between publicId

and systemId.”

Does the emperor have any clothes at all?
</rant>

c. Other resources:

4. Where does OWL fit with linked data

Danny Ayres

Sandro Hawke

Egon Willighagen

why OWL ain’t bad

what is OWL good for?

CKAN & RDF … why ontologies matter

5. Related venues

a. LOD-LAM The International Linked Open Data in Libraries, Archives, and Museums Summit

June 2-3, 2011, San Francisco

Convened leaders in their respective areas of expertise from the humanities and sciences to catalyze practical, actionable approaches to publishing Linked Open Data, specifically:

  • Identify the tools and techniques for publishing and working with Linked Open Data
  • Draft precedents and policy for licensing and copyright considerations regarding the publishing of library, archive, and museum metadata
  • Publish definitions and promote use cases that will give LAM staff the tools they need to advocate for Linked Open Data in their institutions

A summary for the meeting was presented at Talis’ Linked Data and Libraries 2011:

slides, live-blog, video stream at 1:03:30 in the morning’s video stream from July 14th.

[ previous]   [ next]