Extant Metadata • CLIR

Late breaking news: 28 September 2011

From CENL (Conference of European National Libraries) and Europeana:

Meeting at the Royal Library of Denmark, the Conference of European National Librarians (CENL), has voted overwhelmingly to support the open licensing of their data. CENL represents Europe’s 46 national libraries, and are responsible for the massive collection of publications that represent the accumulated knowledge of Europe.

What does that mean in practice?
It means that the datasets describing all the millions of books and texts ever published in Europe – the title, author, date, imprint, place of publication and so on, which exists in the vast library catalogues of Europe – will become increasingly accessible for anybody to re-use for whatever purpose they want.

The full announcement is posted here.

1. In general

a. Jen Riley Seeing S tandards: A V isualization of the M etadata U niverse:

The sheer number of metadata standards in the cultural heritage sector is overwhelming, and their inter-relationships further complicate the situation. This visual map of the metadata landscape is intended to assist planners with the selection and implementation of metadata standards.

b. John Battelle File U nder: Metaservices, T he R ise O f

The rise of the app economy exacerbates the problem-most apps live in their own closed world, sharing data sparingly, if at all. And while many have suggested that Facebook’s open social graph can help untangle the problem, in fact it only makes it worse, as Fred put it in a recent post (which sparked this Thinking Out Loud session for me):

“The people I want to follow on Etsy are not the same people I want to follow on Twitter. The people I want to follow on Svpply are not my Facebook friends. I don’t want to share my Foursquare checkins with everyone on Twitter and Facebook.”

Like nearly all of us, Fred’s got a social graph instrumentation problem and a service data-sharing problem. Here’s what he suggests:

“I would like to be able to run these people through all my social graphs on other services (not just Facebook and Twitter) and also my phone contacts and my emails to help me filter them and quickly add those people if I think they would make the social experience on the specific service useful to me.”

When you break it down, what Fred is asking is this:

That each service he uses will make the data that he creates available to any other service with which he wishes to share.
That each service he uses be capable of leveraging that data.

For that to happen, every app, every site, and every service needs to be more than just an application or a content directory. It needs to be a platform, capable of negotiating ongoing relationships with other platforms on behalf of its customers in real time. This, of course, is what Facebook does already. Soon, I believe, every single service of scale will work in a similar fashion.

When you think about a world in which this idea comes true, all sorts of new services become possible: Metaservices, services which couldn’t exist unless they had the oxygen of other services’ datastreams to consume.At present, I can’t really think of any such services that are currently at scale. (I can think of some promising stuff in early stages-Memolane and Percolate come to mind.)

c. Leigh Dodds Context R emains K ing: W hy L inking is the N ext B ig T hing (Talis, kasabi blog)

There was an interesting post published on Programmable Web yesterday about how Foursquare are developing their platform. The most important aspect of that is this goal:

to make Foursquare the missing “Rosetta Stone for location, allowing you to link information about a real-world place from one database to any other.” Now it’s not just about using Foursquare, but connecting it to other services.

This is interesting as it’s another example of a trend that I’ve been expecting for some time: that the current crop of API/data providers will recognise the power of being able to connect different databases through shared identifiers. Yahoo Geoplanet took a similar step in this direction in April last year, as Programmable Web also reported. And in October the Guardian announced that they were connecting their platform to MusicBrainz.
[snip]

The programmable web article notes that:

…there’s no straightforward way to extract much more information from these particular partner sites beyond the link to their listing page.

It doesn’t seem likely that things will stay this way though. There are plenty of sites that deal with location, have their own APIs, and stand to benefit from having their index of places harmonized with Foursquare’s. And then, of course, there may be enterprising mash-up builders who work out ways to extract information directly from linked pages, even if they are intended for browser display and not application parsing.
[snip]

Sharing identifiers and linking between datasets is useful not just because it helps any individual dataset owner become the “Rosetta Stone” for their specific domain. It’s useful because we live in a Long Tail world.

As a data provider, no matter how much energy you put into curation to make your data more comprehensive there will always be some additional external data, some additional context, that can add value. That value may be incremental to the majority of users, but it will be important to someone.

Additional context unlocks value by providing additional way to access, interpret or navigate a dataset. Additional context allows us to ask more questions of the data.

d. Jeff Jonas Context: A Must-Have and Thoughts on Getting Some …

I spent more than ten hours on this post; more than any other single post. And unfortunately, despite this effort, I feel this post deserves substantially more work.

Operating on a datum without first placing it into context is a risky proposition. Whether interested in mitigating risk or maximizing opportunity, no surprise, Context is King. And thus, from my point of view, Determining context is the most significant technical hurdle necessary to deliver the next generation of business intelligence.

So, if you must have context the next question is: “How do you get some?”

The construction of context primarily depends upon: A) the features available in an observation, B) the ability to extract the essential features from the observation, and C) the ability to use the extracted features to determine how the new observation relates to one’s historical observations.
[snip]

Prediction:
When next generation feature extraction engines and next generation context accumulating engines converge, these systems are going to be the underpinnings of very, very smart systems. Add real-time and relevance detection … and you have more than situational

awareness … you begin to approach the cognitive domain.

e. Other resources:

2. Intellectual property

a. JISC Information Environment Team

Help with the legal issues in reusing bibliographic records

Curtis and Cartwright and Naomi Korn Copyright Consultancy undertook a study for JISC to explore legal implications for UK university libraries of providing their catalogue records for re-use in Web applications and, in the light of these, provide practical legal guidance to libraries interested in doing this.

What does open bibliographic metadata mean for academic libraries?

Recently there seems to be a surge in activity around open bibliographic metadata.

Libraries throughout Europe have been experimenting. The British Library and the CERN library are two notable examples of libraries that have decided to release their bibliographic metadata under an open licence.

You can get an idea for the amount of experimentation and interest in this area by following the lively discussions on the Open Knowledge Foundation’s (OKF) open bibliography mailing list.

JISC has funded the LOCAH and Openbib projects, which are exploring a Linked Data approach to open bibliographic data from Cambridge University and the British Library (openbib) and Copac and the Archives Hub (LOCAH).

There are interesting websites such as Open library and biblios.net engaging in the collection and reuse of open bibliographic metadata.

The experimentation isn’t limited to libraries; there is exciting work happening in related sectors such as cultural heritage. Two very good examples are the Culture Grid which is a collection of UK cultural heritage content produced by the Collections Trust and Digital New Zealand which focuses on enabling the reuse of content from New Zealand’s cultural heritage institutions.

Open B ibliographic D ata G uide

Why are libraries around the world devoting time and resources to releasing their bibliographic data under an open licence? What’s in it for them and what are the costs and practical issues involved? JISC’s purpose for this guide is to try and provide some answers to these questions and to help academic librarians think about the potential implications for their own library.

b. Johathan Rochkind Implications of CC-BY on data

[excerpts as follow]

… even if the stuff at hand is copyrightable, the person offering the license needs to own or otherwise be authorized to offer a license for it, for a CC license to actually be binding or enforceable or mean anything at all. In many cases I see people offering a CC-BY license for data, that data wasn’t really “created” by them at all, if it is copyrightable it’s not at all clear to me that the person trying to bind you by a CC re-use license is the one who would own that copyright, or has been licensed by whoever would own that copyright to re-license it to you under a CC license.

But let’s put that aside for now, just for the sake of discussion, and assume that, okay, here’s this collection of data, or database, or pieces of data, and I have a CC-BY license to use it, and it really is valid, I have permission to use the stuff, but only under the terms of the CC-BY license, let’s assume that.

So I’m mixing this stuff all together, not only are maybe some records from one source and some from another, but even within a ‘record’, some elements are from one store and some from another. I’m using a variety of possible methods, both algorithmic, crowd-sourced, and expert-edited, to make my database as good as possible. You know, maybe I have geographic data from a bunch of sources, and I combine it all together, and I have algorithms (maybe based on usage, and constantly evolving) to take the ‘best’ piece of info when different sources conflict, and I let my users improve it themselves when they find errors, etc.

The really exciting thing about open data is that it can continue to be re-mixed and re-used by more generations, mashing it together with other data to create more stuff. Data, unlike narrative human language, is inherently composed of a bunch of individual pieces just begging to be mixed and matched … and then someone else whats to take that new database you created and do it AGAIN mixing with other sources, and so on down the road.

Can they take my aggregated database (including several sources of data, some CC-BY from a variety of licensors), and take out individual pieces, and put it in their own new database mixed together with a bunch MORE sources?

Probably.

But it’s not up to ME, right?

If you really want the data to be re-useable, just CC0/public-domain/no-rights-claimed with it.

c. Regarding explicit licenses for metadata

Owen Stephens has encountered many points of view regarding licences for metadata. Here he offers the following:

In the course of this work I’ve co-authored the JISC Guide to Open Bibliographic Data and become a signatory of the Discovery Open Metadata Principles. Recently I’ve been discussing some of the issues around licensing with Ed Chamberlain and others (see Ed’s thoughts on licensing on the CUL-COMET blog), and over coffee this morning I was trying to untangle the issues and reasons for specific approaches to licensing – for some reason they formed in my head as a set of Q&A so I’ve jotted them down in this form… at the moment this is really to help me with my thinking but I thought I’d share in case.

d. A proposed 4-star scheme for linked open cultural metadata

This from the LOD-LAM Summit (International Linked Open Data in Libraries Archives and Museums Summit June 2-3, 2011 San Francisco):

One of the outcomes of last week’s LOD-LAM Summit was a draft document proposing a new way to assess the openness/usefulness of linked data for the LAM community. This is a work in progress, but is already provoking interesting debate on our options as we try to create a shared strategy. Here’s what the document looks like today, and we welcome your comments, questions and feedback as we work towards version 1.0.

e. Other resources:

3. Books

a. Google Books project … metadata

(1) Leonid Taycher Books of the W orld, S tand up and be C ounted!

We collect metadata from many providers (more than 150 and counting) that include libraries, WorldCat, national union catalogs and commercial providers. At the moment we have close to a billion unique raw records.
[how many of these original cataloging by national and other libraries?]

We then further analyze these records to reduce the level of duplication within each provider, bringing us down to close to 600 million records.

(2) Eric Hellman Google E xposes B ook M etadata P rivates at ALA F orum

Kurt Groetsch reported on Google’s metadata processing. They have over 100 bibliographic data sources, including libraries, publishers, retailers and aggregators of review and jacket covers. The library data includes MARC records, anonymized circulation data and authority files. The publisher and retailer data is mostly ONIX formatted XML data. They have amassed over 800 million bibliographic records containing over a trillion fields of data.

(3) Jon Orwant Creating a Trillion-Field Catalog: Metadata in Google Books

This video captures his presentation to the 2010 Charleston Conference, November, 2010

b. Library catalogs and sub-collections of records published as linked data

(1) British Library shares around 14 million bibliographic records [ access paths]

The British Library is to make its extensive collections of bibliographic records available for free to researchers and other libraries.

The UK national library has around 14 million catalogue records comprising a wealth of bibliographic data. The initiative announced today will help expose this vast dataset to users worldwide, allowing researchers and other libraries to access and retrieve bibliographic records for publications dating back centuries and relating to every conceivable subject area.

The new free service will operate in parallel to the British Library’s priced bulk MARC data supply activity which is used extensively by large commercial customers.

“By making the British Library’s bibliographic data available in new ways for wider, non-commercial use we want to encourage users beyond the traditional library world to explore and use this important international resource,” said Neil Wilson, the British Library’s Head of Metadata Services.

(2) British Library JISC OpenBibliography: British Library data release

The JISC OpenBibliography project is excited to announce that the British Library is providing a set of bibliographic data under CC0 Public Domain Dedication Licence.

We have initially received a dataset consisting of approximately 3 million records, which is now available as a CKAN package. This dataset consists of the entire British National Bibliography, describing new books published in the UK since 1950; this represents about 20% of the total BL catalogue, and we are working to add further releases. In addition, we are developing sample access methods onto the data, which we will post about later this week.

Usage guide from BL: This usage guide is based on goodwill. It is not a legal contract. We ask that you respect it.

Use of Data: This data is being made available under a Creative Commons CC0 1.0 Universal Public Domain Dedication licence. This means that the British Library Board makes no copyright, related or neighbouring rights claims to the data and does not apply any restrictions on subsequent use and reuse of the data. The British Library accepts no liability for damages from any use of the supplied data. For more detail please see the terms of the licence.

Support: The British Library is committed to providing high quality services and accurate data. If you have any queries or identify any problems with the data please contact metadata@bl.uk.

Share knowledge: We are also very interested to hear the ways in which you have used this data so we can understand more fully the benefits of sharing it and improve our services. Please contact metadata@bl.uk if you wish to share your experiences with us and those that are using this service.

Give Credit Where Credit is Due: The British Library has a responsibility to maintain its bibliographic data on the nation’s behalf. Please credit all use of this data to the British Library and link back to www.bl.uk/bibliographic/datafree.html in order that this information can be shared and developed with today’s Internet users as well as future generations.

from the OKF blog

The data has been loaded into a Virtuoso store that is queriable through the SPARQL Endpoint and the URIs that we have assigned each record use the ORDF software to make them dereferencable, supporting perform content auto-negotiation as well as embedding RDFa in the HTML representation.

The data contains some 3 million individual records and some 173 million triples. Indexing the data was a very CPU intensive process taking approximately three days. Transforming and loading the source data took about five hours.

(3) CERN Library B ookdata and their video announcement

On this page you find the Bibliographic Data from Books in the catalog ( CDS) of CERN Library as download.

The actual data export is from 6.12.2009. We are working on providing regular updates. The Data is provided in zipped MARCXML. We are working on providing other formats, specially RDF.

(4) Deutsche Nationalbibliothek Linked D ata S ervices

In the long term the DNB is planning to offer a linked data service which will permit the semantic web community to use the entire stock of its national bibliographic data, including all authority data. A suitable data service needs to be created to distribute the new data format alongside the already established access channels (OAI, SRU etc.).

One of the aims of the service will be to attract new target groups and accordingly it is important for the project to analyse their requirements in detail, making contact with them in order to identify their precise needs. The proposal, therefore, is to launch a beta service which is based primarily on past experience and users requirements. The beta service described in this documentation is aimed at establishing an initial partnership with this new clientele as a means of sounding out each other’s views. In the medium-term, the target groups will be expanded to include commercial service providers such as operators of search engines and knowledge management systems alongside research institutions and non-profit organisations. The DNB is endeavouring to make a significant contribution to the global information infrastructure with its new data service by laying the foundations for modern commercial and non-commercial web services.

from Conditions of use for the German National Library’s data services

These state that:

Data which the German National Library provides free of charge of for a provision fee may be duplicated, distributed and made publicly accessible.
The data may be added to and processed at will. In return for providing the data free of charge, the German National Library reserves the right to obtain any enhancements which have been created, and to integrate these into its own data collection.
A condition for the free downloading and possible further processing and forwarding of the data is that any labels of origin, or labelling as data of the German National Library are retained in a form specified by the library.
The commercial re-use of the data is at variance with the basic principle of transmission under the same conditions. This possibility is not completely excluded, however; each case must be agreed separately with the German National Library.

(5) Hungary OPAC and D igital L ibrary as L inked D ata [an LOD email thread]

The national library of Hungary-officially named as National Széchényi Library (NSZL)-proudly announces that its entire OPAC and Digital Library and the corresponding authority data have been published as Linked Data. The used vocabularies are:

RDFDC for bibliographic data,
FOAF for name authority entries, and
SKOS for subject terms and geographical names.

NSZL uses CoolURIs. Every resource has both RDF and HTML representation.

Our RDFDC, FOAF and SKOS statements are linked together. Our name authority is matched with the DBPedia name files and URI aliases are handled as owl:sameAs statements.

(6) lobid.org ( Linked Open Bibliographic Data)

North Rhine-Westphalian Library Service Center’s (hbz) Linked Open Data service.

Our target is the conversion of existing bibliographic data and associated data to Linked Open Data. Until now two services emerged in the context of lobid.org:

lobid-resources … dcat-based catalog of library bibliograhpic data sets
lobid-organisations … linked-data based index of library institution IDs & info

The hbz wiki on exports from the hbz union catalog located here.

The list includes these linked-data projects

Linked-Open-hbz-Data … RDF triples of all published records
Dortmund University library catalog
German National Library of Medicine
Cologne University library catalog, and Applied Sciences library catalog

(7) Sweden Making a L ibrary C atalogue P art of the S emantic W eb

In this paper we describe the tools and techniques used to make the Swedish Union Catalogue (LIBRIS) part of the Semantic Web and Linked Data. The focus is on links to and between resources and the mechanisms used to make data available, rather than perfect description of the individual resources. We also present a method of creating links between records of the same work.

c. Other pools, sources, and projects related to book metadata

(1) CKAN Bibliographic data group (some resources not public domain)

Among the resources not specifically associated with other linked data work in libraries:

Biblios.net (LibeLime’s store of bib records)
Short biographical dictionary of the English language
Citeseer metadata
DBLP computer science bibliography containing 1.5m records
ISBNdb.com project database
JISC Open Bib BNB
JISC Open Bib Cambridge
LoC as of 2007 via Scriblio project
Talis MARC records … 5.5 million
University of Michigan MARC records (original cataloging)

(2) CKAN Library Linked Data group (some resources not public domain)

Among the resources not specifically associated with other linked data work in libraries:

BibBase.org facilitates the dissemination of scientific publications over the Internet
The Thesaurus for the Social Sciences (Thesaurus Sozialwissenschaften)
SWT thesaurus for economics
During the entire 20th century, the press archives now held at the German National Library of Economics (ZBW)

(3) National Union Catalog Pre-56 Imprints (Mansell) Titles in OCLC Worldcat

754 volumes

(4) ONIX standards for Books, Serials, and licensing terms[EDitEUR]

from Peter Murray’s summary of ALA’s January 2011 ALCTS (Association for Library Collections and Technical Services) forum: Mix and match: mashups of bibliographic data

This year the ALCTS Forum at ALA Midwinter brought together three perspectives on massaging bibliographic data of various sorts in ways that use MARC, but where MARC is not the end goal. What do you get when you swirl MARC, ONIX, and various other formats of metadata in a big pot? Three projects: ONIX Enrichment at OCLC, the Open Library Project, and Google Book Search metadata.

Below is a summary of how these three projects are messin’ with metadata, as told by the Forum panelists. I also recommend reading Eric Hellman’s Google Exposes Book Metadata Privates at ALA Forum for his recollection and views of the same meeting.

[Renee Register] looked at a new and evolving product at OCLC on the enhancement of ONIX records with WorldCat records, and vice versa.

As libraries, Renee said “our instincts are collaborative” but “our data and workflow silos encourage redundancy and inhibit interoperability.” Beyond the obvious differences in metadata formats, the workflows of libraries differ dramatically from other metadata providers and consumers. In libraries (with the exception of CIP and brief on-order records) the major work of bibliographic production is performed at the end of the publication cycle and ends with the receipt of the published item. In the publisher supply chain, bibliographic data evolves over time, usually beginning months before publication and continuing to grow for months and years (sales information, etc.) after publication. Renee had a graphic showing the current flow of metadata around the broader bibliographic universe that highlighted the isolation of library activity relative to publisher, wholesaler, and retailer activity.

Renee [went] on to describe a “next generation cataloging data flow” where OCLC facilitates the inclusion of publisher data into WorldCat and enhances publisher data with information extracted from WorldCat. To the right is a version of the graphic she used at Midwinter taken from an earlier presentation on the same topic. It shows ONIX-formatted metadata coming into WorldCat, being cross-walked and matched with existing MARC data in WorldCat, and finally extracted and cross-walked back to ONIX resulting in enhanced ONIX metadata for publishers to use in their supply chain.
[snip]

In concluding her remarks, she offered several resources to explore for further information: the OCLC/NISO study on Streamlining Book Metadata Workflow, the U.K. Research Information Network report on Creating Catalogues: Bibliographic Records in a Networked World, the Library of Congress Study of the North American MARC Records Marketplace, the Library of CongressCIP/ONIX Pilot Project, and the OCLC Publisher Supply Chain Website.

(5) RDTF (Resource Discovery Taskforce) RLUK (Research Libraries UK) [RDTF schema]

What we’ve been up to … as of March, 2011
[excerpts, as follow]

… the work between now and December 2012 has been split into 3 iterative phases. We are currently in the 1st phase that runs from January to July.

Phase 1

We have funded 8 projects to investigate new approaches to making metadata about the collections of libraries, museums and archives available in a way that enables the metadata to be reused to enrich the original collection and make it more visible. [described here]

We have commissioned a few reports to address specific challenges involved in the vision. There have been 3 of these so far:

A set of proposed metadata guidelines for museum, library and archives for the rdtf vision—these will be picked up by the management framework project described below
A report on the issues involved in the aggregation of multimedia content
A guide for librarians on why open bibliographic data might be important and how to go about engaging in this area

The [management framework] project has a website which describes the deliverables in more detail. It is a very collaborative project as can be seen from the vision and approach. Specifically, the framework project is supported by 2 advisory groups. One focused on technical issues and one on management issues..

The management framework project works extremely closely with a dedicated communications and relationship management project. The purpose of this project is to work with the vast range of stakeholders involved in this work (see the table in the implementation plan) and to ensure that we keep on top of their needs, issues and use cases

(6) Nick Ruffilo Five Degrees of Metadata: Small Changes Can Mean Big Sales

As CIO of BookSwim.com, a Netflix-like book rental service, I am a massive consumer of book metadata. What I’ve noticed so far is that the data I receive about bestselling books is usually great—descriptions are clean, short, and describe the content well, and the books are tagged with three or four well-targeted categories. But for every bestseller that has quality data, there are hundreds of other books—perhaps thousands—that have too little or too much data.

(7) Tim Spalding LibraryThing

54 M books
5.5 M works

4. Journals

a. arXiv Archive for Electronic Scientific Preprints

The arXiv is an archive for electronic preprints of scientific papers in the fields of mathematics, physics, astronomy, computer science, quantitative biology and statistics, which can be accessed via the world wide web. In many fields of mathematics and physics, almost all scientific papers are placed on the arXiv. On 3 October 2008, arXiv.org passed the half-million article milestone, with roughly five thousand new e-prints added every month.

b. EZB Elektronische Zeitschriftenbibliothek

The Elektronische Zeitschriftenbibliothek EZB (Electronic Journals Library) offers an effective use of both scientific and academic journals publishing full text articles in the internet.

This service has been developed at the Universitätsbibliothek Regensburg (University Library of Regensburg) in cooperation with the Universitätsbibliothek der Technischen Universität München (University Library of the Technical University of Munich).

At the moment, it contains 53,892 titles, among them 7,486 online-only journals, covering all subjects.

c. Google Scholar

Google Scholar provides a simple way to broadly search for scholarly literature. From one place, you can search across many disciplines and sources: articles, theses, books, abstracts and court opinions, from academic publishers, professional societies, online repositories, universities and other web sites. Google Scholar helps you find relevant work across the world of scholarly research

Searching within citing articles examine an article’s influence via citations to it.

Resource Shelf’s take on this capability.

d. Highwire Press

As the leading ePublishing platform, HighWire Press partners with independent scholarly publishers, societies, associations, and university presses to facilitate the digital dissemination of 1,464 journals, reference works, books, and proceedings. HighWire also offers a complete manuscript submission, tracking, peer review, and publishing system for journal editors, Bench>Press. HighWire provides outstanding technology and support services, and fosters a dynamic and innovative community, enhancing the strengths of each of its members.

e. JSTOR

JSTOR offers high-quality, interdisciplinary content to support scholarship and teaching. It includes over one thousand leading academic journals across the humanities, social sciences, and sciences, as well as select monographs and other materials valuable for academic work. Journals are always included from volume 1, issue 1, and include previous and unrelated titles. Beginning in 2011, current issues for more than 150 journals will be available on JSTOR as part of the Current Scholarship Program.

[The JSTOR Archive includes 1,482 journals as of 15 August 2011.]

Foresite-toolkit construct, parse, manipulate and serialize OAI-ORE Resource Maps

Foresite is a JISC funded project which aims to produce a demonstrator and test of the OAI-ORE standard by creating resource maps of journals and their contents held in JSTOR.

Libraries support parsing and serialising in: ATOM, RDF/XML, N3, N-Triples, Turtle and RDFa

f. ticTOCs JISC Journal Table of Contents Service (end data March 2009)

The aim of the ticTOCs project is to develop a service which will transform journal current awareness by making it easy for academics and researchers to find, display, store, combine, and reuse tables of contents from multiple publishers in a personalisable web based environment.

[from the final report]

This website fronts a database built on open source technology (Apache 2, MySQL 4) that consists of over 12,000 journal TOC RSS feeds provided by over 430 publishers and offering access to the full text of over 300,000 articles

g. Jonathan Rochkind More on Aggregating Article Metadata

We probably don’t need to create a cooperative metadata creation initiative for article-level metadata, because that metadata (of varying quality, but my hypothesis is “good enough”) is ALREADY out there in the digital world. It’s already been created, pretty much every publisher these days has electronic metadata for their articles published. We just need to _collect_ it. And in many cases, we don’t even need a special business relationship or license to collect it, as the metadata is already being shared open access—which doens’t mean that collecting and aggregating it in a useful way is cheap or easy. It is a non-trivial project that could benefit from some cooperative economies-of-scale action, but it’s not a ‘cataloging’ or metadata _generation_ project exactly.

Consider the JournalTOCs service. Many many publishers these days provide RSS feeds with metadata of their recent publications. By consuming these feeds, and storing what you get over time, JournalTOCs is building a giant database of article metadata—that only goes back as far as when they started collecting it, but that’s still pretty good. My impression is that JournalTOCs is looking for a way to monetize this at a profit, however, rather than provide it in a cooperative cost-sharing basis.

h. Wiki-research-l RE: Wikicite (aka UPEI’s proposal for a “universal citation index”

I have been working with Sam and others for some time now on brainstorming a proposal for the Foundation to create a centralized wiki of citations, a WikiCite so to speak, if that is not the eventual name. My plan is to continue to discuss with folks who are knowledgeable and interested in such a project and to have the feedback I receive go into the proposal which I hope to write this summer. The proposal white paper will then be sent around to interested parties for corrections and feedback, including on-wiki and mailing lists, before eventually landing at the Foundation officially. As we know WMF [Windows MetaFile] has not started a new project in some years, so there is no official process. Thus I find it important to get it right.

The basic idea is a centralized wiki that contains citation information that other MediaWikis and WMF projects can then reference using something like a {{cite}} template or a simple link. The community can document the citation, the author, the book etc. and, in one idealization, all citations across all wikis would point to the same article on WikiCite. Users can use this wiki as their personal bibliography as well, as collections of citations can be exported in arbitrary citation formats. This general plan would allow community aggregation of metadata and community documentation of sources along arbitrary dimensions (quality, trust, reliability, etc.). The hope is that such a resource would then expand on that wiki and across the projects into summarizations of collections of sources (lit reviews) that make navigating entire fields of literature easier and more reliable, getting you out of the trap of not being aware of the global context that a particular source sits in.

[via this pointer from Code4Lib list]

i. Zetoc

Zetoc provides access to the British Library’s Electronic Table of Contents of around 20,000 current journals and around 16,000 conference proceedings published per year. The database covers 1993 to date, and is updated on a daily basis. It includes an email alerting service, to enable you to keep up-to-date with relevant new articles and papers.

Zetoc is free to use for members of JISC-sponsored UK higher and further education institutions and research councils. It is also available to all of NHS England, Scotland and Northern Ireland. A number of other institutions are eligible to subscribe to Zetoc.

5. Grey literature

a. OAIster Union Catalog of 25+ Million Records for Open Access Resources

OAIster was a project of the Digital Library Production Service[1] of the University of Michigan University Library. Its goal is to create a collection of freely available, previously difficult-to-access, academically oriented digital resources that are easily searchable by anyone. OAIster harvests from Open Archives Initiative (OAI)-compliant Digital Libraries, Institutional Repositories, and Online Journals using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) protocol.

In early 2009, OCLC formed a partnership with the University of Michigan in order to provide continued access to open-archive collections through OAIster.

6. Special collections via EADs

a. Cataloging Hidden Special Collections and Archives

About the program:

Libraries, archives, and cultural institutions hold millions of items that have never been adequately described. This represents a staggering volume of items of potentially substantive intellectual value that are unknown and inaccessible to scholars. This program seeks to address this problem by awarding grants for supporting innovative, efficient description of large volumes of material of high value to scholars.

The Council on Library and Information Resources administers this national effort with the support of generous funding from The Andrew W. Mellon Foundation. The first fifteen projects were selected for funding in 2008, and fourteen projects followed in 2009. In 2010, an additional seventeen projects were awarded funding.

b. LOCAH (Linked Open Copac Archives Hub) … as related to EADs

Pete Johnston’s take on his part in working on the EAD aspects of the project:

So far, I’ve mostly been working with the EAD data, with Jane Stevenson and Bethan Ruddock from the Archives Hub. I’ve posted a few pieces on the LOCAH blog, on the high-level architecture/workflow, on the model for the archival description data (also here), and most recently on the URI patterns we’re using for the archival data.

c. SNAC (Social Networks in Archival Context)

Via Ed Summers’ report on Moving Forward With Authority, Society of American Archivists Pre-conference:

I don’t think I was the only one in the audience to immediately see the utility of this. In fact it is territory well trodden by OCLC and the other libraries involved in the VIAF project which essentially creates web pages for authority records for people like John Von Neumann who have written books. It’s also similar to what People Australia,BibApp and VIVO are doing to establish richly linked public pages for people. As Daniel pointed out: archives, libraries and museums do a lot of things differently; but ultimately they all have a deep and abiding interest in the intellectual output, and artifacts created by people. So maybe this is an area where we can see more collaboration across the cultural divides between cultural heritage institutions. The activity of putting EAD documents and their transformed HTML cousins on the web is important. But for them to be more useful they need to be contextualized in the web itself using applications like this SNAC prototype.

7. Museums and archives

a. Art Open Data, Rob Myers

Art Open Data is open data that concerns art institutions, art history, the art market, or artworks. Using this data, we can examine art history and contemporary art in powerful new ways.

There are many potential sources of such data … the problem, as so often, lies in accessibility. Older sources are often in formats that are not machine readable, while newer sources may have restrictive usage terms.

b. Campus Case Studies University Archivists: Working Solutions for Born-digital Records

Through this SAA (Society of American Archivists) portal, quick and broad dissemination of completed projects or work-in-progress is possible. Archivists are encouraged to report on projects that proved successful and/or problematic through a submission system intended for ease of use with a minimum of editorial delay.

c. CKAN group related to cultural heritage

archeology
art
dictionaries
economics
history
linguistics

d. Eschenfelder and Caswell Digital Cultural Collections in an Age of Reuse and Remixes

ABSTRACT: This paper explores the circumstances under which cultural institutions (CI) should seek to control non–commercial reuse of digital cultural works. It describes the results of a 2008 survey of CI professionals at U.S. archives, libraries and museums which gathered data on motivations to control access to and use of digital collections, factors discouraging control, and levels of concern associated with different types of unauthorized reuse. The analysis presents three general themes that explain many of the CI motivations for control: “controlling descriptions and representations”; “legal risks and complexities”; and, “getting credit: fiscal and social costs and revenue.” This paper argues that CI should develop a multiplicity of access and use regulations that acknowledge the varying sensitivity of collections and the varying level of risk associated with different types of reuses. It concludes by offering a set of examples of collections employing varying levels of reuse control (from none to complete) to serve as heuristics.

e. Europeana [cited with commentary here]

resources the project draws on:

contributing institutions

f. LOCAH Linked Open Copac and Archives Hub … as related to archive collections in general

The Locah project is making records from the Archives Hub service and Copac service available as Linked Data. The Archives Hub is an aggregation of archival metadata from repositories across the UK; Copac provides access to the merged library catalogues of libraries throughout the UK, including all national libraries. In each case the aim is to provide Linked Data according to the principles set out by Tim Berners-Lee, so that we make our data interconnected with other data and contribute to the growth of the Semantic Web.

g. National Archives of Great Britain

Press release, Ontotext Contracted by The National Archives of Great Britain (July, 2010):

The National Archives has invested in an ‘intelligent discovery tool’ to improve searches of archived UK Government websites. The contract for development of Government Web Archive Semantic Knowledge Base was granted to a consortium of semantic technology professionals lead by Ontotext.

The consortium includes experts from the GATE team at the University of Sheffield, creators of the most comprehensive open-source text mining ecosystem in the world, as well as System Simulation, a major UK integrator of images, digital assets, collection and content management systems.

[snip]

The project started in mid June and is expected to finish before the end of March 2011. It will aim to bring new methods of search, navigation and information modeling to The National Archives and in doing so make the web archive a more valuable and popular resource. The Government Web Archive Semantic Knowledge Base will also bring together publicly available linked data and open-source text mining technology in a system, which is easy to understand, and can be managed and extended in a predictable and cost efficient manner.

h. ResearchSpace

ResearchSpace is an Andrew W. Mellon Foundation funded project aimed at supporting collaborative internet research, information sharing and web applications for the cultural heritage scholarly community. The ResearchSpace environment intends to provide following integrated elements:

Data and digital analysis tools
Collaboration tools
Semantic RDF data sources
Data and digital management tools
Internet design and authoring tools
Web publication

i. Other resources:

Seb Chan

K. Eschenfelder, …

Mellon and OCLC

Mike Ellis [Amazon]

– MME (Museum Metadata Exchange), Australia

– Controlling access to cultural heritage resources

– Tools for metadata sharing in museum community

– Managing and Growing a Cultural Heritage Web Presence

8. Theses and dissertations

a. NDLT Networked Digital Library of Theses and Dissertations Hub

An international organization that promotes the adoption, creation, use, dissemination and preservation of electronic theses and dissertations. The NDLTD encourages and supports the efforts of institutes of higher education and their communities to develop electronic publishing and digital libraries (including repositories), thus enabling them to share knowledge more effectively in order to unlock the potential benefits worldwide.

9. eLearning

a. JISC / CETIS

Both have considerable history studying the provision and use of e-learning materials under the OER (Open Educational Resources) rubric. Some recent efforts include:

Making the most of open content … understanding use part 1 part 2
Technology and descriptive choices in UK OER
Open educational resources programme – phase 2 [summary of projects underway]
CETIS 2010 conference
E-learning standards

b. Personal Learning Environment … Cengage Launches MindTap

Interestingly, and whether intended or not, the company seems purposeful in drawing a distinction between their platform which they say is ‘agnostic’ and LMS platforms such as Blackboard. Whether this is a skirmish or prelude to war is hard to tell; however, in a recent profile of Blackboard and their development plans publishers may have some concern that Blackboard is looking to play on a much larger playing field.

MindApps create learning paths that integrate content and learning activity applications that map directly to an instructor’s syllabus or curriculum. Unlike other products which are affiliated with a single Learning Management System (LMS), MindTap is LMS agnostic and designed to work with any supported LMS the instructor chooses to use. Students can navigate through a customized dashboard of readings, assignments, and other course information. This powerful combination of personalized content and on-the-go access encourages interactivity, increases student engagement and improves learning outcomes.

c. Libraries as OER, OCW Shops

From Pieter Kleymeer and Molly Kleinman’s paper at Open Ed 2010 [video]

A case study from the University of Michigan

University libraries are well positioned to run OER (Open Educational Resources) production and publication operations, but so far most institutions developing OER or OCW (Open CourseWare) have little or no integration with their respective libraries.

d. Talis Aspire product described in posts part 1 part 2 part 3 part 4

10. Geodata, earth sciences, etc.

a. ESIP Federation (Federation of Earth Science Partners)

Semantic Web Cluster

The cluster provides a forum for dissemination of best practices, technical infusion experience and lessons learned and continuing education for emerging semantic technologies. The cluster also plays a governance role in the development of community vocabularies and ontologies relevant to ESIP members.

b. ESRI (Environmental Systems Research Institute, Inc.)

GIS bibliographic database (100,000+ entries).

search: semantic web—239 entries
search: “linked data”—23 entries
search: RDF—31 entries
search: ontology/ontologies— 800+

11. Research data

Dorothea Salo provided an informative sketch of the issues and problem sets associated with managing research data in her Escaping datageddon presentation.

In April, 2011, UKOLN hosted a seminar entitled Data Management: International Challenges, National Infrastructure and Institutional Responses—an Australian Perspective on Data Management, which was given by Dr. Andrew Treloar, director of technology for the Australian National Data Service (ANDS).

a. D-Lib issue on research data

The management of research data in a digital networked world is increasingly recognized as a significant challenge, a significant opportunity, and absolutely essential to the conduct of scientific research in the 21st century.

One group that has risen to this challenge in recent years is DataCite. Founded in 2009 it is focused on the reliable identification and citation of scientific datasets. I attended their first large public meeting last summer and, impressed by the program, asked Jan Brase, the Managing Agent of DataCite, if he would consider putting together a special issue of D-Lib devoted to datasets. Jan agreed and he and Adam Farquhar, the President of DataCite, served as Guest Editors of this special issue.

The first piece is a brief introduction to DataCite written by the Guest Editors. This is followed by nine articles on various data related topics, eight of which are derived from that DataCite meeting last summer and one of which (Waaijers) happened to come in unsolicited at the time we were putting the issue together and was too good to leave out. The articles cover a wide variety of topics, including the acquisition and management of scientific data, the quality and trustworthiness of that data, the connections between data and traditional scholarly publishing, metadata for datasets, and last but not least a peer reviewed journal devoted to the publication of datasets.

b. Google Refine version 2.0 of Freebase’s Gridworks

Our acquisition of Metaweb back in July also brought along Freebase Gridworks, an open source software project for cleaning and enhancing entire data sets. Today we’re announcing that the project has been renamed to Google Refine and version 2.0 is now available.

Google Refine is a power tool for working with messy data sets, including cleaning up
inconsistencies, transforming them from one format into another, and extending them with new data from external web services or other databases. Version 2.0 introduces a new extensions architecture, a reconciliation framework for linking records to other databases (likeFreebase), and a ton of new transformation commands and expressions.

The Project home provides access to:

Youtube screencasts one, two, three
Wiki
Blogs
Groups: email and developers list
Issues

DERI has produced an RDF extension

c. Other resources:

Data-PASS

DataCite

DataLift

Datamarket.com

DQSS

Figshare

Open access, open data

OpenDataSearch

Brian Westra, …

– Data-Preservation Alliance for the Social Sciences

– Promoting use of persistent IDs for datasets (D-Libarticle)

– A catalyser for the web of data

– An interview with Hjalmar Gislason, founder

– Data quality cleaning service and [use cases]

– A new way to share open scientific data

– Paradigm shifts in scholarly communication scenario

– Via CKAN, a metadata search engine for data resources

– Metadata in curation of science/technology resources

12. Government data

a. Daniel Kaplan Open Public Data: Then What? (Fing)

We tend to assume that the opening up of public data will only produce positive outcomes for individuals, for society and the economy. But the opposite may be true. We should start thinking further ahead on the possible consequences of releasing public data, and how we can make sure they are mostly positive.

All of us who advocate the opening up of public (and other) data for reuse by citizens, researchers or entrepreneurs, hope that something good will come out of it. What “something” we have in mind probably differs. The same goes for what each of us considers “good”: I may believe that creating commercial value out of free-to-use public data is good, while others may not. I may hate crime maps because they stigmatize without solving anything, while others may think they save lives. That’s fine. In fact, that’s even the basic reason why we should support open data: because it provides the common grounds upon which different agents, with different motivations, will create different things—with God, or Darwin, eventually knowing their own.

Topics addressed:

Drawing the consequences
A radiant future
Peering into the dark side
Data fatigue

Part 2 continues:

What triggers what?
Another take on some challenges posed by Open Government Data
Actors and levers

b. A research agenda for open government data, James Hendler, from his post:

The growth in interest in Open Government Data systems is growing like crazy. More and more governments are releasing data, and a number of projects are underway to harness and use this data in interesting ways.

Thanks in large part to this growing movement, there has been an increased interest in identifying the research issues associated with the work. The most exciting development so far was the initiation of aWorkshop hosted by the US National IT R&D agency which is bringing together government leaders and researchers to explore the needs of this community. I’ve been lucky enough to be asked to join the final panel which will try to explore the technical, social, and legal agendas for researchers in this area. A big thanks to Beth Noveck who was one of the prime movers on getting this meeting to happen, and to Aneesh Chopra‘s office, which approved and largely organized the meeting.

Following this meeting there will be one in Albany in April that will be co-chaired by Beth, Theresa Pardo, Andrew Hoppin, and me. The mission of that one is to build on the March meeting. The workshop will take an interdisciplinary approach to creating a multi-year open government research agenda focused on identifying critical needs, mapping needs to potential solutions, identifying legal and policy barriers, exploring critical evaluative approaches, and laying out strategies for attaining future research funding.

We’ve also begun discussions of more workshops that follow on these, and we’re talking with the US National Science Foundation about some funding for these meetings.

So lots is happening, [watch] this space for more information.

c. Other resources:

Phil Archer

Chris Faraone

Bernadette Hyland

Nigel Shadbolt

– On cleaning up government data part 1 part 2 part 3

– Infopocalypse: The Cost of Too Much Data

– Report on Open Government Conference, Talis’ Nodalities

– Open for Business

13. Academic infrastructure data

a. Christopher Gutteridge Opening up University Infrastructure Data (Southampton)

Around five years ago we (The School of Electronics and Computer Science, University of Southampton) had a project to create open data of our infrastructure data. This included staff, teaching modules, research groups, seminars and projects. This year we have been overhauling the site based on what we’ve learned in the interim. We made plenty of mistakes, but that’s fine and what being a university is all about. We’ll continue to blog about what we’ve learned.

b. JISC The Activity Data Programme

We have recently funded a new programme to focus on the ways that universities can benefit from collecting, analysing and reusing the data about the way that their staff and students interact with institutional systems.

There are obvious examples of this type of innovation paying dividends in the commercial world such as Tesco’s use of clubcard data and Amazon’s recommendation services but there are fewer examples of this in the Higher Education sector. JISC has funded this programme to assess the opportunities available to universities and to build on any promising opportunities that emerge.

We have funded 8 projects to explore activity data in institutions and a synthesis project to ensure that the information produced by each project is collected and communicated to ensure that as many people as possible can benefit from the projects.

c. Other resources:

Lorcan Dempsey

Sheila MacNeill CETIS

Jennifer Zaino

– Information management on campus (OCLC)

– Use of corporate systems for educational purposes

– Open data services at Southampton

14. Implicit sources of metadata

a. Lorcan Dempsey Reading Lists, Citation Management and Bibliographic Tissue

We will see much more activity connecting user environments and bibliographic resources. I am thinking of citation managers, reading lists, social bookmarking sites (see citulike and unalog) and RSS feeds. Some of these may be specifically supported by the library (e.g,. a citation manager service), some may be developed within an academic or scholarly context (e.g., Zotero, citulike, …), and some may be general network services. People have multiple ways of creating personal and shared collections of data and links.

They are also an example of an increasingly important aspect of our bibliographic apparatus —we have discovery or ‘rendezvous’ experiences outside the library resource, where it would be good to be able to link back into a library service for fulfillment, or indeed into other services

I have written several times of such resources as a type of bibliographic tissue… We can expect such flows to improve: look for example at the list of databases the Mendeley Web Importer feature works with.

Now, although libraries offer such services, in many cases students, teachers, or researchers may find their own way to support such activity and they may use services outside the library orbit.

In this context I was interested recently to learn about two initiatives.

First was the Talis Aspire product [cited with commentary here], a description of which I encountered at the Emtacl10 conference. It is a shared resource list management system. I have no experience of it, but was interested in the description. It allows the list creator to pull metadata from several resources; it claims to integrate with learning management system, ILS, and other relevent campus environments; it is provided in the cloud, and claims to have social features on the way, building on top of the aggregate resource list data. Clearly, resource lists and the works included in them are interesting social objects around which some community might develop. So, it is providing a management framework which potentially strengthens the connective bibliographic tissue of the reading list.

However, the most interesting aspect of the description to me were the suggestions made around improving the quality of the educational experience for the student, increasing the productivity of the educator, and leveraging systems investment. Reading lists are central to the educational experience and improving their utility or making it easier to manage them are big gains.

Second was MyReferences, produced by the Telstar project at the Open University, which aims to provide a framework for embedding references/citations in learning management systems. Specifically, the OU team worked with RefWorks and Moodle, but are also providing a range of interesting materials about use cases, experiences, technologies, and so on.

Owen Stephens, TELSTAR Project Manager, said: “These new tools are invaluable to the 21st century educational institution and student. There is an ever increasing wealth of resources available and hence a real need for students, course and programme teams to be able to create, manipulate, organise and store a range of citations and bibliographic references for easy use.

There are already a number of general referencing tools available to students, but MyReferences takes the usability of these tools a step further by integrating them into online courses so the materials students commonly need to reference are already available in the format they need. Students simply select the sources they need to reference, the referencing style their institution requires and then copy and paste the result into their assignment. [Press Release: Referencing Made Easy]

In each case, there is an attempt to add value to the simple supply of bibliographic data by providing the ability to make that data work in student and teacher workflows, and by conveniently connecting workflow to useful resources.

b. Other resources:

CETIS / JISC

Dan Cohen

Credo Reference

Library of Congress

Mendeley

NARA

Jonathan Rochkind

Christina Smart

Talis

Zotero

– OER metaphors (Open Educational Resources)

– A Million Syllabi via Syllabus Finder

– Reference works matching academic, govt., corp. needs

– Research guides, examples: one two

– Manage, share, and discover research content/contacts

– Managememt of records from web 2.0/social media platforms

– Problems using class numbers in MARC data

– Virtual learning environments … 2011 assessment

– On their Aspire product [cited with commentary here]

– About, support, and roadmap

15. Other sources of well-curated metadata [intentionally sparse sample]

a. ANC (American National Corpus) from a post at the OKF blog by Nancy Ide

The American National Corpus (ANC) project is creating a collection of texts produced by native speakers of American English since 1990. Its goal is to provide at least 100 million words of contemporary language data covering a broad and representative range of genres, including but not limited to fiction, non-fiction, technical writing, newspaper, spoken transcripts of various verbal communications, as well as new genres (blogs, tweets, etc.).

b. Ancient World Open Bibliographies

This blog is for discussion and development of a project to collect and solicit annotated bibliographies about subjects relevant to studies of the ancient world (as NYU defines it—‘from the Pillars of Hercules to the Pacific, from the beginnings of human habitation to the late antique / early Islamic period.’) The bibliographies will be collected at a dedicated wiki site, which will be open access.

c. Digital humanities NINES, 18th:Connect, Alexander Street Press

In an ongoing effort to improve scholarly access to digital humanities resources, Alexander Street Press has partnered with The Networked Infrastructure for Nineteenth-Century Electronic Scholarship (NINES) and its sister organization, 18thConnect: Eighteenth-

Century Scholarship Online, to enable cross-search access to all relevant eighteenth- and nineteenth-century content from Alexander Street online collections, including, most recently, The Romantic Era Redefined.

NINES currently aggregates 736,696 peer reviewed digital objects from 88 federated sites. In addition to The Romantic Era Redefined, other Alexander Street Press collections indexed in the NINES and 18thConnect networks include

A steady advocate for the cross-linking of scholarly resources, Alexander Street is partnering with a wide range of publishers and discovery platform services to improve scholarly access to digital materials in the humanities.

d. Genealogy Family search … 2010 update via Resource Shelf

FamilySearch announced the addition of over 200 million new searchable historic records representing 18 countries to its online database. The new records were added to the hundreds of millions FamilySearch published earlier this year at a similar event in Salt Lake City, Utah. The number of records on the pilot site totals 700 million.

e. Genealogy Washington DC Semantic Web Meetup … via Ed Summers

Last week’s Washington DC Semantic Web Meetup focused onHistory and Genealogy Semantics. It was a pretty small, friendly crowd (about 15-20) that met for the first time at the Library of Congress. The group included folks from PBS, the National Archives, the Library of Congress, and the Center for History and New Media–as well as some regulars from the Washington DC SGML/XML Users Group.

Brian Eubanks gave a presentation on what the Semantic Web,Linked Data and specifically RDF and Named Graphs have to offer genealogical research. He took us on a tour through a variety of websites, such as Land Records Database at the Bureau of Land Management, Ancestry.com, Footnote and Google Books and made a strong case for using RDF to link these sorts of documents with a family tree.

f. WESS (Western European Studies Section)

The Western European Studies Section (WESS) represents librarians and others who specialize or are otherwise professionally involved in the acquisition, organization, and use of information sources originating in or related to Western European countries. Our aim is to promote the improvement of library services supporting study and research in Western European affairs from ancient times to the present.

16. Other topics

a. FRBR

(1) John Ockerbloom What do You Read, My Lord?… Works, Works, Works

There’s been a lot of talk lately in the library world about the coming age of FRBR-ized library catalogs (prompted in part by development of RDA, a cataloging standard that uses FRBR). Exactly what such catalogs will look like, and whether they will actually help readers use the library more effectively, are matters of ongoing debate. One of the key differences between FRBR and older catalog models is that books and other resources that share common properties can be grouped together at various levels of abstraction.

FRBR, highly abridged
Such grouping can be helpful, for instance, for people looking for a suitable copy of Shakespeare’s play Hamlet. In a traditional library catalog, a title search for Hamlet might yield a long list of hundreds of hits, in which it is difficult to select a particular copy, or to find other appropriate search results (like William Faulkner’s bookThe Hamlet) among all the hits for various versions of Shakespeare’s creation. In a FRBR-ized catalog, though, the various editions (or “Manifestations,” in FRBR-speak) of Hamlet can be grouped into various “Expressions,” denoting particular texts of Hamlet, and those Expressions can be grouped into a single “Work” denoting Shakespeare’s dramatic creation.
[snip]

Reflections

All of the catalogs above, then, are somewhat “FRBR-like”, but they don’t fully implement the FRBR functional or data model. I’m not sure, though, how closely they need to conform to those models. I can see room for improvement in each catalog, but they all seem to work well enough to have gained notable user communities.

How should a “FRBR”-like catalog treat Hamlet? Personally, I like the rich work-level data (both formal and informal) that I find on pages like LibraryThing. I also like the easy access to online copies provided by Open Library, the faceting and wide-angle “work” aggregation of WorldCat.org, and the scale and full-text searchability of Google Books. On the other hand, I’d like to see more consistent grouping and descriptions in each of the catalogs, and more assistance for users in selecting an appropriate edition than I currently see in any of them.

Out of hundreds of editions of Hamlet, some are particularly useful for various audiences, such as students trying to understand Elizabethan speech, actors and directors preparing to perform it, literary researchers examining and comparing source texts, and cultural historians considering famous stagings and adaptations. Why not include more data highlighting editions that are particularly useful for these and other purposes? I don’t recall such metadata being a significant part of the FRBR data model, but it seems to be in the spirit of the functional model of helping people select the right book for their needs.

On the other hand, I don’t see a great need to make a lot of differentiation between works of Hamlet and expressions of Hamlet. Insisting on lots of sharp distinctions between various high-level records in the user interface could well confuse users more than it helps them, unless there are effective ways of presenting them as a unit when appropriate.

(2) Other resources:

John Ockerbloom

Jen Riley

Jonathan Rochkind

– The Concept of a Work in the Catalog Web

– Enhancing Interoperability of FRBR-Based Metadata

– FRBR (2006)

– LibraryThing adds FRBR ‘Expressions’

b. Annotation

(1) OAC Open Annotation Collaboration

facilitate emergence a Web and resource-centric interoperable annotation environment that allows leveraging annotations across the boundaries of annotation clients, annotation servers, and content collections
demonstrate the utility of this environment
see widespread adoption of this environment

To this end the OAC has made available a draft annotation data model and onotology. As of January 2011, in collaboration with partner projects, the OAC has begun a series of scholarly annotation demonstration experiments. We are holding our first Using the OAC Data Model Workshop in March of 2011

[The] OAC project is pleased to announce a Request For Proposal to collaborate with OAC researchers for building implementations of the OAC data model and ontology. The OAC is seeking to collaborate with scholars and/or librarians currently using and/or curating established repositories of scholarly digital resources with well-defined audiences of scholars.

(2) Jon Udell Fear Not, Book Lovers. The Future of Marginalia is Bright!

A story about marginalia in today’s New York Times, Book Lovers Fear Dim Future for Notes in the Margins, opens with an account of a rare and otherwise undistinguished book that’s valuable only because Mark Twain scribbled in its margins …

Actually it’s a problem facing everyone, and if we solve it for ourselves we’ll solve it for libraries too. The Times story wanders off into nostalgia without proposing any solution. Here’s my proposal for the next Mark Twain and for all the rest of us too: a network of cloud-based personal data stores.