Projects, institutions, organizations... • CLIR

1. AKSW Agile Knowledge Engineering and Semantic Web, Univesität Leipzig

Projects include :

2. ANDS Australian National Data Service

ANDS is building the Australian Research Data Commons: a cohesive collection of research resources from all research institutions, to make better use of Australia’s research outputs.

3. BBC … a variety of initiatives, including

4. BnF … Bibliothèque nationale de France … about the site and CKAN’s summary

data.bnf.fr gathers data from the different databases of the Bibliothèque nationale de France, so as to create Web pages about Works and Authors, together with a RDF view on the extracted data. There are about 1,400,000 RDF triples.

There are links to id.loc.gov for languages and nationalities, to dewey.info for subjects, and to DCMI type for types. The service uses SKOS, FOAF, DC and RDA vocabularies, in a FRBR model.

5. BKN Bibliographic Knowledge Network

The Bibliographic Knowledge Network (BKN) is a project funded for two years starting September 2008 by the NSF Cyber-enabled Discovery and Innovation (CDI) Program to develop a suite of tools and services to encourage formation of virtual organizations in scientific communities of various sizes, such as conference groups and departmental research groups, and allow such organizations to filter out relevant documents from various input streams, select and enhance the quality of bibliographic data associated with the organization, and attract students, teachers and researchers to contribute to activity of the organization.

Mike Bergman ( Structured Dynamics, UMBEL) provided a summary of the project in relation to its follow-on product: Structured Dynamics. BKN combined a CMS (Drupal), and RDF triple store (Virtuoso) and an indexing engine (Solr). It provided a set of standard functionality (create, read, update, delete, browse/search, import/export, etc.

Re-cap of benefits

toolset for structured data conversion, use, exposure
data-driven via ontologies; easily scoped, tailored
naive data formats and RDF APIs via RESTful web services
web services framework can mix-and-match:
standalone
integrated with CMS
external tool access

web-wide user and dataset access & permissions

6. CKAN Comprehensive Knowledge Archive Network

CKAN is a registry of open knowledge packages and projects (and a few closed ones).

CKAN makes it easy to find, share and reuse open content and data, especially in ways that are machine automatable.

As a system CKAN functions as a synthesis of several different services.

groups

7. Chronicling America: historic American newspapers … Library of Congress

Thoughts about linked data in the context of this project:

Libraries and L inked D ata: C onfessions of a G raph A ddict, Ed Summers (2010)

and the source document for Ed’s presentation

Evolution of workflow to support the project:

Notes on R etooling L ibraries, Ed Summers (2010)

8. CLOCKSS Controlled LOCKSS [ info about LOCKSS]

A not for profit joint venture between the world’s leading scholarly publishers and research libraries whose mission is to build a sustainable, geographically distributed dark archive with which to ensure the long-term survival of Web-based scholarly publications for the benefit of the greater global research community.

9. DERI ( Digital Enterprise Research Institute), National University of Ireland, Galway

Units

Projects

10. DKFI ( Deutsche Forschungszentrum für Künstliche Intelligenz)

DFKI conducts contract research in virtually all fields of modern AI, including image and pattern recognition, knowledge management, intelligent visualization and simulation, deduction and multi-agent systems, speech- and language technology, intelligent user interfaces, business informatics and robotics.

11. DPLA ( Digital Public Library of America)

The Digital Public Library of America (DPLA) will make the cultural and scientific heritage of humanity available, free of charge, to all. By adhering to the fundamental principle of free and universal access to knowledge, it will promote education in the broadest sense of the term. That is, it will function as an online library for students of all ages, from grades K-12 to postdoctoral researchers and anyone seeking self-instruction; it will be a deep resource for community colleges, vocational schools, colleges, universities, and adult education programs; it will supplement the services of public libraries in every corner of the country; and it will satisfy other needs as well-the need for data related to employment, for practical information of all kinds, and for enrichment in the use of leisure.

Remarks on the first workshop (March 1, 2011):

12. Drupal

What RDF M ight do for Drupal

How to B uild Linked D ata S ites with Drupal 7 and RDFa

13. E-Business and W eb S cience R esearch G roup

Martin Hepp, Unversität Bundeswehr München, projects include : GoodRelations

14. IKS ( Interactive Knowledge Stack)

An open source community, whose projects are focused on building an open and flexible technology platform for semantically enhanced content management systems (CMS)

The IKS EU Research Project is funded in part by a €6.58m grant from the European Union and governed by a core consortium of seven research partners and six industrial partners.

15. JISC … a variety of initiatives

16. KMi ( Knowledge Media Institute), Open University

The Knowledge Media Institute (KMi) was set up in 1995 in recognition of the need for the Open University to be at the forefront of research and development in a convergence of areas that impacted on the OU’s very nature: Cognitive and Learning Sciences, Artificial Intelligence and Semantic Technologies, and Multimedia. We chose to call this convergence Knowledge Media.

Projects include :

LUCERO ( Linked University Content for Education and Research Online)

… working with groups of learners, researchers and practitioners based at the Open University, LUCERO will scope, prototype, pilot and evaluate reusable, cost-effective solutions relying on the linked data principles and technologies for exposing and connecting educational and research content.

Annomation

… finding the right video now is a bit like finding the right Web page was back in the 1990s: we are limited to searching the smattering of keywords that occurred to the video’s creator. Annomation makes it easy for people to add semantic annotations using Web 3.0 techniques: videos, and segments in the video, can be described by links to concepts in DBpedia, the Library of Congress and Dewey classifications, geographical data sets, and other parts of the Semantic Web Linked Data Cloud.

SugarTube

SugarTube (Semantics Used to Get Annotated Video Recording) is a Web3.0 application to search for videos through RDF-based annotated video stored as part of the Open University Broadcast Unit’s learning material. The fundamental technology used to develop the application is Semantic Web Services. Users can search based on keywords, textual analysis of related documents, URLs, or geographical maps. Moreover, SugarTube gathers more useful data from the LOD cloud to enrich the search results, such as related events, people, knowledge, websites, geo-location, maps, and additional video streams from YouTube, the BBC, and OpenLearn.

17. LASSO (Lookup & Alignment Service with Semantic Open Data)

Project LASSO intends to deploy, improve and extend Linked Data (LD) infrastructure in three different use cases. The central feature of all use cases is a lookup service which helps to augment already existing, formalized knowledge with facts from the Linked Open Data (LOD) cloud. These three use cases are Semantic Desktop, Enterprise Collaboration and Inspiration Services. Currently available systems in all of these application domains are sparsely using data from the Semantic Web.

All systems rely on their local knowledge repositories and make no use from publicly available data like from DBpedia or other LD sources. Augmenting local knowledge repositories with additional facts from the web can improve several knowledge services in all usage scenarios in a significant way …

18. LOCAH (Linked Open Copac Archives Hub)

Pete Johnston’s take on his part in working on the EAD aspects of the project:

Contrary to appearances, I haven’t completely abandoned eFoundations, but recently I’ve mostly been working on the JISC-funded LOCAH project which I mentioned here a while ago, and my recent scribblings have mostly been over on the project blog.

LOCAH is working on making available some data from the Archives Hub (a collection of archival finding-aids i.e., metadata about archival collections and their constituent items) and from Copac (a “union catalogue” of bibliographic metadata from major research and specialist libraries) as linked data.

So far, I’ve mostly been working with the EAD data, with Jane Stevenson and Bethan Ruddock from the Archives Hub. I’ve posted a few pieces on the LOCAH blog, on the high-level architecture/workflow, on the model for the archival description data (also here), and most recently on the URI patterns we’re using for the archival data.

19. LOCKSS Lots of Copies Keep Stuff Safe [ info about CLOCKSS]

An international community initiative that provides libraries with digital preservation tools and support so that they can easily and inexpensively collect and preserve their own copies of authorized e-content.

GPO joins LOCKSS … meaning GPO is assisting the LOCKSS-USDOCS project in preserving content harvested from fdsys.gov. That means we are developing a geographically distributed network of digital archives.
Publishers and T itles

20. LOD2 … creating knowledge out of interlinked data

LOD2 is a large-scale integrating project co-funded by the European Commission within the FP7 Information and Communication Technologies Work Programme (Grant Agreement No. 257943). Commencing in September 2010, this 4-year project comprises leading Linked Open Data technology researchers, companies, and service providers from across 7 European countries and is coordinated by the AKSW research group at the University of Leipzig.

Over the past 3 years, the semantic web activity has gained momentum with the widespread publishing of structured data as RDF. The Linked Data paradigm has therefore evolved from a practical research idea into a very promising candidate for addressing one of the biggest challenges in the area of intelligent information management: the exploitation of the Web as a platform for data and information integration in addition to document search. To translate this initial success into a world-scale disruptive reality, encompassing the Web 2.0 world and enterprise data alike, the following research challenges need to be addressed: improve coherence and quality of data published on the Web, close the performance gap between relational and RDF data management, establish trust on the Linked Data Web and generally lower the entrance barrier for data publishers and users.

With partners among those who initiated and strongly supported the Linked Open Data initiative, the LOD2 project aims at tackling these challenges by developing:

enterprise-ready tools and methodologies for exposing and managing very large amounts of structured information on the Data Web,
a testbed and bootstrap network of high-quality multi-domain, multi-lingual ontologies from sources such as Wikipedia and OpenStreetMap,
algorithms based on machine learning for automatically interlinking and fusing data from the Web,
standards and methods for reliably tracking provenance, ensuring privacy and data security as well as for assessing the quality of information,
adaptive tools for searching, browsing, and authoring of Linked Data.

We will integrate and syndicate linked data with large-scale, existing applications and showcase the benefits in the three application scenarios of media and publishing, corporate data intranets and eGovernment. The resulting tools, methods and data sets have the potential to change the Web as we know it today.

The project site provides lists of information

Thomas Thurner (Semantic Web Company) posts this interview: The H ype, th e H ope and the LOD2: Sören Auer E ngaged in the N ext G eneration LOD.

21. OEG Ontology Engineering Group

based in the Facultad de Informática, Universidad Politécnica de Madrid

… research in the areas of Ontological Engineering, Natural Language Processing, Semantic Web, Semantic e-Science and the Real World Internet. Among the projects in which the group participates is:

MONNET (Multilingual Ontologies for Networked Knowledge)

semantics-based solution for integrated information access across language barriers
among other partners are DERI and DKFI

22. OKF Open Knowledge Foundation

The Open Knowledge Foundation (OKF) is a not-for-profit organization founded in 2004 and dedicated to promoting open knowledge in all its forms. It is a leader in this field nationally and internationally.

The Foundation’s activities are organized around individual working groups and projects, each focused on a different aspect of open knowledge, but united by a common set of concerns, and a common set of traditions in both etiquette and process.

Projects include :

CKAN … a registry / catalogue system for datasets and other “knowledge” resources
LOD2
Open Bibliography … in concert with JISC
Open Data Commons … legal solutions for open data

23. ResearchSpace

ResearchSpace is an Andrew W. Mellon Foundation-funded project aimed at supporting collaborative internet research, information sharing and web applications for the cultural heritage scholarly community. The ResearchSpace environment intends to provide following integrated elements:

Data and digital analysis tools
Collaboration tools
Semantic RDF data sources
Data and digital management tools
Internet design and authoring tools
Web publication

The development of the ResearchSpace system (during 2011 will have both technical and process synergies with the Mellon’s CollectionSpace and ConservationSpace projects which aim to produce next generation collection and conservation systems. However, ResearchSpace will support data exchange with other collection systems whether commercial or open source.

ResearchSpace will provide a range of flexible tools to support a wide range of workflows and will develop these tools on an ongoing basis. Semantic technology is at the core of the infrastructure because it provides an effective mechanism for research and collaboration across data provided by different organisations and projects. ResearchSpace aims to reduce the costs of developing and operating new and innovative systems, creating a more sustainable research and production environment. ResearchSpace is an enabling environment that will develop over time with the help of those that use it.

Status (April, 2011):

Stage 2 – Specifications and Feasibility – Complete
Stage 3 – Working Prototype – Software development procurement soon

24. Talis Platform … data, connected [ hosting and consultancy blog]

The Talis Platform weaves data with the web to create a highly available and adaptable environment for data sharing. Supporting data publishers and developers, the Platform provides:

Dedicated storage for both structured and unstructured data
Query interfaces to enable data exploration and manipulation
Cloud-based data hosting to reduce hardware and startup costs
Future-proofing for data and applications, through the latest industry standards

Modern business is driven by data. Business on the web involves access to, and manipulation of, ever-increasing volumes of data. Opening up this data makes it possible.

For data owners and consumers, the Platform provides the means to draw the most value from new and existing datasets.

For software developers, the Platform offers services operating on highly-connected data through its RESTful API.

25. Talis Xiphos … building a scholarly web of data (2008)

The slides and video from the presentation by Chris Clarke summarize a thought-provoking wireframe prototype that came out of a charge to the group to “go away for a month and design a social network for scholarly data” [built in 4 weeks, of which we spent 3 weeks arguing].

26. UK Discovery … a metadata ecology for UK education and research

In 2010, the JISC and RLUK Resource Discovery Taskforce (RDTF) worked with stakeholders from the libraries, archives and museums to set out a vision for making the most of our resources by effectively positioning their metadata for discovery and reuse within the global information ecosystem.

Our aim is that Discovery will help to mobilise and energise the community, engaging stakeholders to create a critical mass of open and reusable data, and explore what open data makes possible through real-world exemplars and case studies.

Discovery siteand its developer competition

27. UMBEL … Upper Mapping and Binding Exchange Layer

UMBEL provides two valuable functions:

First, it is a broad, general reference structure of 28,000 concepts, which provides a scaffolding to link and interoperate other datasets and domain vocabularies, and
Second, it is a base vocabulary for the construction of other concept-based domain ontologies, also designed for interoperation.

Regarding the release of version 1.0:

Structured Dynamics and Ontotext have just released version 1.00 of UMBEL. This version is the first production-grade release of this open source, reference ontology for the Web.

For more information and downloads, please see http://umbel.org.

In broad terms, here is what is included in the new version 1.00:

A core structure of 27,917 reference concepts (RCs), an increase of 36% over the prior version
The clustering of those concepts into 33 mostly disjoint SuperTypes (STs)
Direct RC mapping to 444 PROTON classes
Direct RC mapping to 257 DBpedia ontology classes
An incomplete mapping to 671 GeoNames features
Direct mapping of 16,884 RCs to Wikipedia (categories and pages); 60% of UMBEL is now mapped to Wikipedia
The linking of 2,130,021 unique Wikipedia pages via 3,935,148 predicate relations; all are characterized by one or more STs with 876,125 also assigned a specific type
And, some vocabulary changes, including some new and some dropped predicates.

More detail regarding version 1.0 is available here.

Background discussions about the development of UMBEL can be had in four posts:

28. Utilika Foundation

Mission: Our mission is universal interactivity. We seek to advance communication and collaboration among diverse human and artificial agents, by means of pure and applied research.

Interest: We have chosen to focus on interactivity across the boundaries of human languages. There are about seven thousand languages in the world. We seek to make it possible for humans, collaborating with automatic agents, to use their own native languages and yet share information, ideas, and emotions panlingually.

Research: Utilika Foundation supported research at the University of Washington’s Turing Center for 5 years, investigating new methods of massively multilingual communication. This work produced papers, articles, and demonstration applications. It led to the PanLex project. [details here, as well as a summary of functionality in the now quiescent PanImages prototype]

Current efforts: We built an open-source (PostgreSQL under Linux) branch of the database (named “ PanLex”) with a design that includes domains, multilingual definitions, provenance, grammatical word classes, and arbitrary metadata. We are populating the database with information reported in more than 3,000 resources. We have sextupled the database from its 2007 size of 2.5 million words in 1,029 languages to over 17 million words in more than 6,000 languages in March 2011. We continue to seek lexical data for PanLex, particularly on low-density (poorly documented) languages, so if you have resources containing such data please let us know.

29. W3C Library L inked D ata I ncubator G roup

… explore how existing building blocks of librarianship, such as metadata models, metadata schemas, standards and protocols for building interoperability and library systems and networked environments, encourage libraries to bring their content, and generally re-orient their approaches to data interoperability towards the Web.

30. Web-based Systems Group Freie Universität, Berlin

Christian Bizer: The Web-based Systems Group explores technical and economic questions concerning the development of global, decentralized information environments. Our current research focus are Linked Data technologies for extending the World Wide Web with a global dataspace

[ previous] [ next]

Projects, institutions, organizations…