What ... in scope
… what’s viable, and what can stay on the shelf?
Included here are references to thought pieces that outline an environmental framework for the Stanford Workshop. Far from being cast in concrete, these musings and papers highlight those facets of the linked-data/semantic-web ecology that lie both inside and outside the intended focus of the week’s work in June and July.
In-scope for the workshop
1. Be part of the web, not just on it
a. Andy Powell is a long-time advocate of leveraging the web’s body of practice and technologies as the most effective means of making academic content and its associated metadata widely available and useful. In his Repositories thru the looking glass (2008) post, he writes:
It strikes me that repositories are of interest not just to those librarians in the academic sector who have direct responsibility for the development and delivery of repository services. Rather they represent a microcosm of the wider library landscape—a useful case study in the way the Web is evolving, particularly as manifest through Web 2.0 and social networking, and what impact those changes have on the future of libraries, their spaces and their services.
… the 'service oriented' approaches that we have tended to adopt in standards like the OAI-PMH, SRW/SRU, and OpenURL sit uncomfortably with the 'resource oriented' approach of the Web architecture and the Semantic Web. We need to recognize the importance of REST as an architectural style and adopt a 'resource oriented' approach at the technical level when building services.
b. Ed Summers provides a 2011 summary of thinking and developments related to linked data being of the web (REST as an architectural style, resource oriented approaches, etc.). His comments refer to an eFoundations post (Andy Powell and Pete Johnson) about metadata guidelines for the UK Resource Discovery Task Force (more about the RDTF here and here):
As I’ve heard you argue persuasively in the past, the success of the WWW as a platform for delivery of information is hard to argue with. One of the things that the WWW did right (from the beginning) was focus the technology on people actually doing stuff…in their browsers. It seems really important to make sure whatever this metadata is, that users of the Web will see it (somehow) and will be able to use it. Ian Davis’ points in Is the Semantic Web Destined to be a Shadow? are still very relevant today I think.
Aligning with the web is a good goal to have. Relatively recent service offerings from Google and Facebook indicate their increased awareness of the utility of metadata to their users. And publishers are recognizing how important they are for getting their stuff before more eyes. It’s a kind of virtuous cycle I hope.
This must feel like it has been a long time in coming for you and Pete. Google’s approach encourages a few different mechanisms: RDFa, Microdata and Microformats. Similarly, Google Scholar parses a handful of metadata vocabularies present in the HTML head element. The web is a big place to align with I guess.
I imagine there will be hurdles to get over, but I wonder if your task force could tap into this virtuous cycle. For example, it would be great if cultural heritage data could be aggregated using techniques that big search companies also use: e.g. RDFa, microformats and microdata; and sitemaps and Atom for updates. This would assume a couple things: publishers could allow (and support) crawling, and that it would be possible to build aggregator services to do the crawling. An important step would be releasing the aggregated content in an open way too. This seems to be an approach that is very similar to what I’ve heard Europeana is doing…which may be something else to align with.
c. Peter Murray adds this note about constraints embedded in the mental model that evolved alongside library metadata practice and technologies:
What trips up our [library] community even more, I think, is that we have a tendency to equate this communications format [MARC] with a mental model of how we describe things from a bibliographic point of view. We think of discrete records that describe these things rather than a network (or, more accurately, a graph) of interrelated nodes. This forces us to focus on the textual content of fields and not on the relationships between things. And in doing so, we are not making the best use of our limited efforts to describe the things in our curatorial care.
d. Brian O’Leary posts about an interview with Toby Green, head of publishing for the Organization for Economic Co-operation (OECD), a worldwide publisher in economics and public policy. Green’s take on access to information includes this:
At heart, most people don’t care if it’s a book or a periodical, whether it’s online or offline. As Toby puts it, “People are looking for answers, not books or data or papers. That’s why we bundle all OECD knowledge, in book form, journals, data sets, tables, working papers, you name it, into a single, seamless, online platform”.
e. Kevin Kelly provides another take on being of the web back in 2008 at O’Reilly’s WEB 2.0 Summit. In a 15-minute session, he reviewed the history of the “web” (some 6,000+ days old at that point) with an eye to projecting what characteristics WWW might exhibit at an equal distance into the future. The future he postulated is one based on an all-pervasive, all-consuming web of data that is a worldwide … a web in which:
If you’re producing some information and it’s not webized, not in some way online, and related and shared to everything else, it doesn’t count.
What we’ve learned from the first web is that we have … to believe in the impossible. It was impossible what’s happened in only 6,000 days. If what I’m talking about [for the next 6,000 days] sounds impossible, you have to believe it because that’s what we’ve learned.
f. Jon Udell offers his definition of what being of the web is all about in his Seven ways to think about the web (January 2011):
Back in 2000, the patterns, principles, and best practices for building web information systems were mostly anecdotal and folkloric. Roy Fielding’s dissertation on the web’s deep architecture provided a formal definition that we’ve been digesting ever since. In his introduction he wrote that the web is “an Internet-scale distributed hypermedia system” that aims to “interconnect information networks across organizational boundaries.” His thesis helped us recognize and apply such principles as universal naming, linking, loose coupling, and disciplined resource design. These are not only engineering concerns. Nowadays they matter to everyone. Why? Because the web is a hybrid information system co-created by people and machines. Sometimes computers publish our data for us, and sometimes we publish it directly. Sometimes machines subscribe to what machines and people publish, sometimes people do.
Given the web’s hybrid nature, how to can we teach people to make best use of this distributed hypermedia system? That’s what I’ve been trying to do, in one way or another, for many years. It’s been a challenge to label and describe the principles I want people to learn and apply. I’ve used the terms computational thinking, Fourth R principles, and, most recently, Mark Surman’s evocative thinking like the web.
Back in October, at the Traction Software users’ conference, I led a discussion on the theme of observable work in which we brainstormed a list of some principles that people apply when they work well together online. It’s the same list that emerges when I talk about computational thinking, or Fourth R principles, or thinking like the web. Here’s an edited version of the list we put up on the easel that day:
2. Meaning and doing what we say