What ... out of scope • CLIR

… staying on this side of the looking glass

I can’t believe that! said Alice.
Can’t you? the Queen said in a pitying tone.
Try again: draw a long breath, and shut your eyes.
Alice laughed. There’s not use trying, she said: one can’t believe impossible things.
I daresay you haven’t had much practice, said the Queen.
When I was your age, I always did it for half-an-hour a day.
Why, sometimes I’ve believed as many as six impossible things before breakfast.

1. httpRange-14

A recent iteration of this longstanding, ongoing discussion was launched by Ian Davis of Talis in a blog post Is 303 really necessary (and in email to the LOD mailing list):

For those new to this debate the current practice in the Linked Data community is to divide the world into two classes: things that are definitely “Information Resources” and things that might be information resources or might be something else entirely, like a planet or a toucan.

[snip]

Why, you might ask, is all this emphasis placed on information resources? The answer is that the overwhelming use of the web is to serve up electronic documents, predominantly html. The Linked Data people want to use the web’s infrastructure to store information about other things (planets and toucans) and use HTTP URIs to denote those things. Because toucans aren’t electronic documents it has been assumed that we need to distinguish the toucan itself from the document containing data about the toucan. One of the central dictums of Linked Data is that URIs can only denote one thing at a time: that means the URI for the toucan needs to be different from the URI for the document about the toucan. We connect the two in two ways:

When someone issues an HTTP GET to the toucan’s URI, the server responds with a 303 status code redirecting the user to the document about the toucan.
When someone issues an HTTP GET to the document’s URI, the server responds with a 200 status code and an RDF document containing triples that refer to the toucan’s URI.

That is the current state of affairs for situations where people want to use HTTP URIs to denote real-world things. (There is another approach that uses URIs with a fragment, e.g., http://example.com/doc#foo which avoids this 303 redirect, but it has its own problems as I point out here and here ).

For contrasting views on the topic, follow the LOD email thread beginning here and have a look at these few posts:

And for thoughts by Jeni Tennison, see her What do URIs mean anyway?

The summary of my thinking is:

We should learn to cope with ambiguity in URIs.
We should not constrain how applications manage that ambiguity, though duck typing seems the most promising approach to me.
We should define some specific properties that can be used to disambiguate URIs, describe their defaults with 303s and hash URIs and provide an easy upgrade path as publishers choose to add more specificity.

The key will be how we find practical ways to cope with the real, imperfect, fuzzy web of data while providing an evolutionary path to greater clarity and specificity that publishers can take when they see the benefit of doing so.

2. LinkedData = RDF + SPARQL + …

Andy Powell at eFoundations summarizes and evaluates a few aspects of this ongoing discussionhere, and includes what might well serve the workshop as a rough and ready definition for (lower case) linked data. Powell is quoting from Dan Brickley’scomment in response to Michael Hausenblas’What else? post::

I have no problem whatsoever with non-RDF forms of data in “the data Web.” This is natural, normal, and healthy. Statistical information, geographic information, data-annotated SVG images, audio samples, JSON feeds, Atom, whatever.

We don’t need all this to be in RDF. Often it’ll be nice to have extracts and summaries in RDF, and we can get that via GRDDL or other methods. And we’ll also have metadata about that data, again in RDF, using SKOS for indicating subject areas, FOAF++ for provenance, etc.

The non-RDF bits of the data Web are-roughly-going to be the leaves on the tree. The bit that links it all together will be, as you say, the typed links, loose structuring and so on that come with RDF. This is also roughly analogous to the HTML Web: you find JPEGs, WAVs, flash files, and so on linked in from the HTML Web, but the thing that hangs it all together isn’t flash or audio files, it’s the linky extensible format: HTML. For data, we’ll see more RDF than HTML (or RDFa bridging the two). But we needn’t panic if people put non-RDF data up online…. it’s still better than nothing. And as the LOD scene has shown, it can often easily be processed and republished by others. People worry too much! 🙂

3. RDA (Resource Description and Access) … new cataloging rules

Here are links for the project’s website, a summary about it in Wikipedia and from UKOLN (2010), plus commentary by Jonathan Rochkind and Dianne Hillman and Kelley McGrath.

a. Karen Coyle recently posted this in her report from Norway’s Knowledge Organization 2011 meeting:

I was asked to do a short wrap-up of the first day, and as I usually do I turned to the audience for their ideas. Since we realized we are short on answers and long on questions, we decided to gather some of the burning questions. Here are the ones I wrote down:

If not RDA, what else is there?
Are things on hold waiting for RDA? Are people and vendors waiting to see what will happen?
Why wasn’t RDA simplified?
How long will we pay for it?
Will communities other than those in the JSC use it?
Can others join JSC to make this a truly international code?
Should we just forget about this library-specific stuff and use Dublin Core?

I suspect that there are many others wondering these same things.

b. Jonathan Rochkind published this (April 2011) in response to a comment on his blog:

I do think that the fact that the narrative text of the RDA rules is behind a paywall-is a huge huge problem if you want it to be successful as an actual interoperable standard. (It is sadly not the only huge problem, either. The intentions of RDA are noble…)

4. Semantic web as “reasoning-focused” applications

a. Chris Bizer posted this contrasting pair of definitions on the LOD email list in 2008:

Looking back at the developments over the last years, I think there are two general types of use cases:

Sophisticated, reasoning-focused applications which use an expressive ontology language and which require sound formal semantics and consistent ontologies in order to deliver their benefits to the user. In order to keep things consistent, these applications usually only work with data from a small set of data sources. In order to be able to apply sophisticated reasoning mechanisms, these applications usually only work with small datasets.
The general open Web use case where many information providers use Semantic Web technologies to publish and interlink structured data on the Web. Within this use case, the benefits for the user mainly come from the large amounts of Web-accessible data and the ability to discover related information from other data sources by following RDF links.

For each type of the use cases, there is usually a different set of technologies applied. OWL and classic heavy-weight reasoning for the first use case. HTTP, RDF, RDFS and light-weight smushing techniques for the second use case.

b. Glenn McDonald’s pithy note (regarding Semantic Web Summit East, November 2010):

The other half or so of the conference was devoted to the problem of having machines read human text. This is an interesting but essentially unrelated pursuit that happens to share the word “semantic,” and combining the two [LOD and semWeb] makes barely more sense than holding a Things Having Risen convocation of second-coming evangelists and sourdough bakers.

c. Stefano Mazzocchi remarks on to a post by Jim Hendler in 2007:

from Hendler’s The dark side of the semantic web:

It is the realization that the REST approach to the world is a wonderful way to use RDF and it is empowered by the emerging standards of SPARQL, GRDDL, RDF/A and the like. In short, it is the Semantic Web vision of Tim, before Ora and I polluted it with all this ontology stuff, coming real! And the good news for folks like me is that some little pieces of OWL turn out to be important to making this work (OWL Ultra Lite?) …

and Mazzocchi’s response:

There are surprising statements in there:
all this ontology stuff has polluted the original vision of the Semantic Web.

Finally somebody said it out loud! Yes folks, hear hear: ‘ontology stuff pollutes the semweb’. I’m going to make t-shirts and print it. I might have found the business model for the semweb.

some little pieces of OWL

Yes, correct, only a few little pieces of OWL are required. Mostly owl:sameAs, owl:equivalentProperty and subClass and subProperty from RDFSchema, that’s it.

It would be a wonderful gift to humanity if the “logicians” stopped trying to teach machines to think on my behalf with syllogistic symbols and just helped us creating operators for such symbols that are actually useful for real work.

d. Jim Hendler’s own take during the summer of 2009 comes to us in his What is the semantic web really all about?

In short, by early 2000, there were a number of people working on approaches to Tim’s idea of a simple, URI-based Linked-data approach called the “Semantic Web.” In 2001, I was lucky enough to be the “et” in Berners-Lee et al., a Scientific American article that outlined the vision.

That technology is now maturing, and becoming much easier to use. However, there’s a lot of more advanced ideas that underline an as-yet unrealized research vision of a far more powerful Semantic Web-and I think that is where a lot of confusion has grown up. That vision is shaped to a greater degree by Artificial Intelligence researchers, by people working on advanced language-based search mechanisms, by people looking at expressive ontologies and complex rules, etc.

But that shouldn’t obscure this important fact: there is a lot of useful technology ready to be applied today in fairly simple ways based on the earlier and now-maturing vision.

[snip]

So I tend to advocate using mostly small, lightweight ontologies that are linked together and which are primarily used to describe datasets as the basis for a lot of exciting new applications that can take advantage of “mashed up data” (and eScience is a key one of these). And that is what the term Semantic Web meant back at the beginning, and being an “old fart” in this space, I think I’ll stick with that.

5. Linked-data browsers

There is certainly value in the “browsers” that technology people do and will use to code, test, and refine interfaces built over graphs of links. But technologists’ “linked data browsers” and interfaces needed by those who seek, gather, consume, and create knowledge are a distinctly different set of tools and capabilities. That distinction can be illustrated by example:

technologists

knowledge workers and information consumers

a. Glenn Macdonald describes the distinction between browsers for technologists and browsers for content consumers with his usual flair:

Just to be clear, I think Kingsley is exactly right that we need a universal data browser, and quite possibly right that Virtuoso’s underlying technology is capable of being an engine for such a thing. But this thing he’s showing isn’t a data browser, it’s a data-representation browser.

It’s as if the first web-browser only did View Source.

We will no more sell the new web by showing people URIs than we sold the old web by showing them hrefs. Exactly the opposite: we sold the old web by not showing people UL and OLs and TD/TD/TD/TD and CELLPADDING=0.

And we’ll sell this new web by not showing them meta-schema and triples and reification and inverse-link entailment.

[previous] [next]

What … out of scope

… staying on this side of the looking glass