Workflows • CLIR

1. Mike Bergman

Bibliographic Knowledge Network, Structured Dynamics, UMBEL

a. BKN in relation to Structured Dynamics [ cited and described here]

b. Vision of web-of-data that shapes his Structured Dynamics environment outlined in:

Seeking a S emantic W eb S weet S pot

Since the first days of the Web there has been an ideal that its content could extend beyond documents and become a global, interoperating storehouse of data. This ideal has become what is known as the “semantic Web“. And within this ideal there has been a tension between two competing world views of how to achieve this vision. At the risk of being simplistic, we can describe these world views as informal v formal, sometimes expressed as “bottom up” v “top down”.

The informal view emphasizes freeform and diversity, using more open tagging and a bottoms-up approach to structuring data. This group is not anarchic, but it does support the idea of open data, open standards and open contributions. This group tends to be oriented to RDF and is (paradoxically) often not very open to non-RDF structured data forms (as, for example, microdata or microformats). Social networks and linked data are quite central to this group. RDFa, tagging, user-generated content and folksonomies are also key emphases and contributions.

The formal view tends to support more strongly the idea of shared vocabularies with more formalized semantics and design. This group uses and contributes to open standards, but is also open to proprietary data and structures. Enterprises and industry groups with standard controlled vocabularies and interchange languages (often XML-based) more typically reside in this group. OWL and rules languages are more often typically the basis for this group’s formalisms. The formal view also tends to split further into two camps: one that is more top down and engineering oriented, with typically a more closed world approach to schema and ontology development; and a second that is more adaptive and incremental and relies on an open world approach.

Again, at the risk of being simplistic, the informal group tends to view many OWL and structured vocabularies, especially those that are large or complex, as over engineered, constraining or limiting freedom. This group often correctly points to the delays and lack of adoption associated with more formal efforts. The informal group rarely speaks of ontologies, preferring to use the term of vocabularies. In contrast, the formal group tends to view bottoms-up efforts as chaotic, poorly structured and too heterogeneous to allow machine reasoning or interoperability. Some in the formal group sometimes advocate certification or prescribed training programs for ontologists.

Readers of this blog and customers of Structured Dynamics know that we more often focus on the formal world view and more specifically from an open world perspective. But, like human tribes or different cultures, there is no one true or correct way. Peaceful coexistence resides in the understanding of the importance and strength of different world views.

Shared communication is the way in which we, as humans, learn to understand and bridge cultural and tribal differences. These very same bases can be used to bridge the differences of world views for the semantic Web. Shared concepts and a way to communicate them (via a common language)-what I call reference structures-are one potential “ sweet spot” for bridging these views of the semantic Web ].
[snip]

and in In S earch of ‘ G old S tandards’ for the S emantic W eb

… Wikipedia + UMBEL + Friends may offer one approach

So, what is the grain of sand at the core of the semantic Web that enables it to bootstrap meaning? We start with the basic semantics and “instructions” in the core RDF, RDFS and OWL languages. These are very much akin to the basic BIOS instructions for computer boot up or the instruction sets leveraged by compilers. But, where do we go from there? What is the analog to the compiler or the operating system that gives us more than these simple start up instructions? In a semantics sense, what are the vocabularies or languages that enable us to understand more things, connect more things, relate more things?

To date, the semantic Web has given us perhaps a few dozen commonly used vocabularies, most of which are quite limited and simple pidgin languages such as DC, FOAF, SKOS, SIOC, BIBO, etc. We also have an emerging catalog of “things” and concepts from Wikipedia (via DBpedia) and similar. (Recall, in this piece, we are trying to look Web-wide, so the many fine building blocks for domain purposes such as found in biology, medicine, finance, astronomy, etc., are excluded.) The purposes and scope of these vocabularies widely differ and attack quite different slices of the information space. SKOS, for example, deals with describing simple knowledge structures like taxonomies or thesauri; SIOC is for describing social media.

By virtue of adoption, each of these core languages has proved its usefulness and role. But, as skew lines in space, how do these vocabularies relate to one another? And, how can all of the specific domain vocabularies also relate to those and one another where there are points of intersection or overlap? In short, after we get beyond the starting instructions for the semantic Web, what is our language and vocabulary? How do we complete the bootstrap process?

Clearly, like human languages, we need rich enough vocabularies to describe the things in our world and a structure of the relationships amongst those things to give our communications meaning and coherence. That is precisely the role provided by reference structures.

2. Emmanuelle Bermes

Linked D ata and W hy W e ( L ibrarians) S hould C are

her use case:

a publisher provides basic information about a book
a national library adds bibliographic and authority control
my local library adds holdings
some nice guy out there adds links from Wikipedia
my library provides a web-page view of this and related books (subject, bio, wikipedia, amazon)

results

no crosswalk / mapping
each one uses his own metadata format, all triples can be aggregated

no data redundancy
each one creates only the data he needs, and retrieves already existing information

no harvesting
the data is available directly on the web

no branding issue
the URIs allow to track down the original data whatever its origin

no software-specific developments
everything relies on open standards such as RDF, SPARQL …
no need to learn a new protocol or query language

3. CultureSampo

A system for publishing heterogeneous linked data as a service

Semantic Computing Research Group (SeCo), Aalto University, Finland

a. In general: How to D eal W ith M assively H eterogeneous C ultural H eritage D ata

… the CultureSampo system for publishing heterogeneous linked data as a service. Discussed are the problems of converting legacy data into linked data, as well as the challenge of making the massively heterogeneous yet interlinked cultural heritage content interoperable on a semantic level. Novel user interface concepts for then utilizing the content are also presented. In the approach described, the data is published not only for human use, but also as intelligent services for other computer systems that can then provide interfaces of their own for the linked data. As a concrete use case of using CultureSampo as a service, the BookSampo system for publishing Finnish fiction literature on the semantic web is presented.

b. CultureSampo A Collective Memory of Finnish Cultural Heritage on the Semantic Web 2.0

Vision:Semantic Web 2.0 of Cultural Heritage
Challenges:Content Complexity & Production
Solution:Semantic Web + Web 2.0
Realization:CultureSampo-Finnish Culture on the Semantic Web 2.0

4. Jody L. DeRidder

Leveraging EAD for Low-Cost Access to Digitized Content at the University of Alabama Libraries

As funding shrinks and researcher demand for online access to primary source materials grows, many institutions seek the most cost-effective method of digitization, online delivery, and long-term access. One method of reducing costs is to leverage existing Encoded Archival Description (EAD) finding aids for search and retrieval, rather than creating item-level descriptions for digitized content. This provides Web access to manuscript materials while still providing context to the user. This article describes the Septimus D. Cabaniss Papers project at the University of Alabama Libraries which seeks to recreate the patron experience in the reading room via the Web. This project tested a model for lowering the costs of Web delivery of large collections using folder level descriptions [employing LOCKSS as one of the tools].

5. Antoine Isaac

W3C Library Linked Data Incubator Group

One slide from his presentation: A dream at the Dutch National Library (Johan Stapel)

6. KiWi … an EU part-funded research project begun in 2008

In KiWi Version 1 we offer a platform for implementing and integrating many different kinds of semantic social software services. This new kind of semantically enhanced social software platform allows users to share and integrate knowledge more easily, naturally and tightly, and allows them to adapt content and functionalities according to their personal requirements. At the heart of KiWi are 18 core functionalities that enable social software developers to easily build and adapt new services as they are required, e.g. within enterprises or on public social software sites. With KiWi Version 1 we are also launching the KiWi Community. We need the help of a larger community of users and developers to match the technology to user needs to demonstrate the power of the underlying technology, and get metadata (semantics) working for users! If you share in this vision then get on board, we’d love to hear from you.

7. Recollection … a free platform for customized views of digital collections

A pair of posts by semanticweb.com one two
Documentation
An overview of the capabilities and components

The mission of the National Digital Information Infrastructure and Preservation Program (NDIIPP) is to develop a national strategy to collect, preserve and make available significant digital content, especially information that is created in digital form only, for current and future generations. In 2008, the NDIIPP partners shared content through a simple web page. In order to explore more useful tools and processes for sharing diverse content across partners’ collections, the Library began a pilot project in 2009 with Zepheira to develop an environment that can be used to collect and explore information about digital collections. The result is a software platform that we are calling Recollection.

8. Peter Sefton and colleagues … managing data in the flow of research:

9. Ed Summers and Dorthea Salo …

N otes on R etooling L ibraries:

If you work in the digital preservation field and haven’t seen Dorothea Salo’s Retooling Libraries for the Data Challenge in the latest issue of Ariadne definitely give it a read. Dorothea takes an unflinching look at the at the scope and characteristics of data assets currently being generated by scholarly research, and how equipped traditional digital library efforts are to deal with it. I haven’t seen so many of the issues I’ve had to deal with (largely unconsciously) as part of my daily work so neatly summarized before. Having them laid out in such a lucid, insightful and succinct way is really refreshing–and inspiring.

[snip]

Another part of Dorothea’s essay that stuck out a bit for me, was the advice to split ingest, storage and access systems.

Salo: Ingest, storage, and end-user interfaces should be as loosely coupled as possible. Ideally, the same storage pool should be available to as many ingest mechanisms as researchers and their technology staff can dream up, and the items within should be usable within as many reuse, remix, and re-evaluation environments as the Web can produce.

This is something we (myself and other folks at LC) did as part of the tooling to support the National Digital Newspaper Program. Our initial stab at the software architecture was to use Fedora to manage the full life cycle (from ingest, to storage, to access) of the newspaper content we receive from program awardees around the US. The only trouble was that we wanted the access system to support heavy use by researchers and also robots (Google, Yahoo, Microsoft, etc) building their own views on the content. Unfortunately the way we had put the pieces together we couldn’t support that. Increasingly we found ourselves working around Fedora as much as possible to squeeze a bit more performance out of the system.

So in the end we (and by we I mean David) decided to bite the bullet and split off the inventory systems keeping track of where received content lives (what storage systems, etc) from the access systems that delivered content on the Web. Ultimately this meant we could leverage industry proven web development tools to deliver the newspaper content…which was a huge win. Now that’s not saying that Fedora can’t be used to provide access to content. I think the problems we experienced may well have been the result of our use of Fedora, rather than Fedora itself. Having to do multiple, large XSLT transforms to source XML files to render a page is painful. While it’s a bit of a truism, a good software developer tries to pick the right tool for the job. Half the battle there is deciding on the right granularity for the job … the single job we were trying to solve with Fedora (preservation and access) was too big for us to do either right.

Having a system that’s decomposable, like the approach that CDL is taking with Microservices is essential for long-term thinking about software in the context of digital preservation. I guess you could say “there’s no-there-there” with Microservices, since there’s not really a system to download–but in a way that’s kind of the point.

VuDL Open source digital library administration (with VuFind front end)

(April 6, 2011) – The Technology Development team at Villanova University’s Falvey Memorial Library announces the official alpha launch of their open source digital library management software, VuDL (http://vudl.org). With VuDL, you can store, manipulate, display and make discoverable your digital collections.

Most digital library software packages are targeted at either small/specific collections or very large/very complicated collections. The former may not have the functionality to describe your objects properly; the latter, too many options and therefore needless complexity. VuDL is designed to be somewhere in the middle: flexible enough to describe different ranges of objects, while small enough to diminish technical overhead.

[snip]

VuDL’s public interface is powered by VuFind (http://vufind.org), an Open Source discovery layer developed and managed by Villanova University. VuFind is currently in use in academic and research libraries in 12 countries, including the National Library of Australia and the London School of Economics.

11. Romain Wenz Pivot project [data.bnf.fr]

[abstract from preliminary paper … post Workshop information noted below]

The Bibliothèque nationale de France has designed a new project in order to make tis data more useful on the Web. This “draft.bnf.fr” project is still at an early stage.. The project involves transforming existing data, enriching and interlinking the dataset with internal and external resources, and publishing HTML pages for browsing by users and search engines. The raw data would also be accessible in RDF following the principles of linked data architecture.

[post Workshop note:]

The Bibliothèque nationale de France released a first version of its “Linked Open Data” project: http://data.bnf.fr

These are simple Web pages about major French writers and works, applying FRBR principles. We gather information from our MARC library catalogue, authority files, EAD archives and manuscripts catalogue, and OAI-DC digital library (Gallica). The HTML is fully opee to the Web (URLs, sitemap). Example: http://data.bnf.fr/11910267/jean_de_la_fontaine/

For each page/concept, the RDF is available in RDF-XML, NT, N3:
http://data.bnf.fr/11928016/jules_verne/rdf.xml ,
http://data.bnf.fr/11928016/jules_verne/rdf.nt ,
http://data.bnf.fr/11928016/jules_verne/rdf.n3 .

We also provide content negociation.

This is a first version, with about 5000 pages. There are links do id.loc.gov for languages and nationalities, dewey.info for subjcets, DCMI type for types.
This is only a first taste: we will improve it in the coming months.

[ previous] [ next]