Crowdsourcing • CLIR

See also the following reference lists:

[ acronyms] [ agencies] [ meetings] [ vendors]

1. Study

From Don, Ramakrishnan, Halvey Crowdsourcing S ystems on the W orld-wide W eb (2011)

[excerpts as follow]

Crowdsourcing systems enlist a multitude of humans to help solve a wide variety of problems. Over the past decade, numerous such systems have appeared on the World-Wide Web. Prime examples include Wikipedia, Linux, Yahoo! Answers, Mechanical Turk-based systems, and much effort is being directed toward developing many more.

As is typical for an emerging area, this effort has appeared under many names, including peer production, user-powered systems, user-generated content, collaborative systems, community systems, social systems, social search, social media, collective intelligence, wikinomics, crowd wisdom, smart mobs, mass collaboration, and human computation. The topic has been discussed extensively in books, popular press, and academia. But this body of work has considered mostly efforts in the physical world. Some do consider crowdsourcing systems on the Web, but only certain system types or challenges (for example, how to evaluate users).

This survey attempts to provide a global picture of crowdsourcing systems on the Web

define and classify such systems,
describe a broad sample of systems
-ranges from relatively simple well-established systems such as reviewing books
-to complex emerging systems that build structured knowledge bases to systems that “piggyback” onto other popular systems.
discuss fundamental challenges such as how
-to recruit and evaluate users,
-to merge their contributions.

Given the space limitation, we do not attempt to be exhaustive. Rather, we sketch only the most important aspects of the global picture, using real-world examples. The goal is to further our collective understanding-both conceptual and practical-of this important emerging topic.

2. Other summaries

a. John Mark Ockerbloom

Finding Our Way in the Crowd: Locating and Cultivating Communities of Knowledge

Communities are essential to managing information

They filter-by curating content
They sort-by curating concepts
They make sense-by creating conversation

Both informal and established communities play important roles

Twitter and blogosphere; publishers and libraries; conferences and schools

We can combine knowledge, strengths of different kinds of communities

With linked data, automated analysis, openness

Cautions

Communities need cultivation

If you build it, “they” don’t always come
If the do come, you need to deal with agendas, noise, spam, misinformation

One community or system isn’t enough

Different communities, tools work for different people
Many interesting questions span multiple disciplines (and multiple communities)
See contrasting, dissenting viewpoints
-avoid confirmation bias
-take advantage of diversity, outreach

2. Some illustrative examples

a. Academia.edu

Academia.edu helps academics follow the latest research in their field:

You can follow what academics in your field are working on

the latest papers they are publishing
talks they are giving
blog posts and status updates they are writing

You can create a webpage on Academia.edu, and share your own research. You can:

list your research interests, and upload papers and talks
get stats on paper views and downloads
find what keywords people use to search for you on Google

b. Paula Bray

Rethinking Evaluation Metrics in Light of Flickr Commons at Museums and the Web 2011

Abstract: In the past several years, cultural heritage institutions, including archives, libraries, and museums, have been placing their collections in Web spaces designed for collaboration and communication. Flickr Commons is one example of a highly visible space where cultural heritage institutions have partnered with a popular social networking site to provide greater discovery to, access of, and opportunities to interact with image collections on a large scale. It is important to understand how to measure the impact of these kinds of projects. Traditional metrics, including visit counts, tell only part of the story: much more nuanced information is often found in comments, notes, tags, and other information contributed by the user community. This paper will examine how several institutions on Flickr Commons – the Library of Congress, the Powerhouse Museum, the Smithsonian, New York Public Library, and Cornell University Library – are navigating the concept of evaluation in an emerging arena where compelling statistics are often qualitative, difficult to gather, and ever-changing.

c. CrowdForge a project at Carnegie Mellon University

CrowdForge breaks down complex tasks into simple, independent micro-tasks that can be completed rapidly and cheaply (www.mybossisarobot.com, explore the use of CrowdForge for preparing science news articles based on research reports)

d. Crowdsourcing and Human Computation, a workshop at CHI 2011

Crowdsourcing and human computation are transforming human-computer interaction: from games with a purpose, to creative uses of Mechanical Turk, to massive volunteer projects like Wikipedia, to new ways to run user studies and new interactive systems powered by crowds. We are just beginning to learn what’s possible when we harness the crowd in human-computer interaction. The goal of this workshop is to stake out a research agenda for our field

e. Graduate Junction

Graduate Junction is the largest postgraduate community where you can find and connect with others within your field of study or who share your research interests.

connect with others that share your interests
share hints and tips on postgraduate life
inspire others with your work and guidance

f. Eric Hellman

Biblio- S ocial O bjects: Copia, Mendeley, LibraryThing and Mongoliad

Should reading environments and social activity be tightly coupled or loosely coupled?
Which comes first, objects connecting you to friends, or friends connecting you to objects?

now what? Once you’ve collected a network of people around a book, what happens next? Social networks are not unlike coffee shops or bars. If the business model for the network owner is to sell books (beer, coffee), the point is to get the network to buy their books (beer, coffee) through the network. Thus Copia’s network will inevitably be slanted towards discovery of new things to read. If the business model for a social network is to collect some sort of membership fee, the point is to make the members so cozy they recruit more members. Hence the vibrant communities at LibraryThing and Mendeley, both of which use “freemium” business models. If the business model is to sell a subscription, the point is to get the reader hooked on characters and continuing narrative. Hence Mongoliad is likely to include a lot of cliffhangers.

There’s one more thing a book-mob will be able do, and that’s evangelize the book. Large, evangelical groups of readers are exactly what Gluejar will need to gather the financial muscle to “unglue” books. Cooperation with all sorts of social networks will be a key to the success of this venture.

Even though its very much a self contained system, I’m really starting to get into Mongoliad, however. Thinking back on other Stephenson works, I’m realizing how ill-fitting they are in book form. The Subutai platform has unglued the narrative from the pages in a rather unexpected way.

g. Terry Jones A W ritable API C ompetition … Got a great idea for O’Reilly’s new API?

Unlike a normal API that provides access to read-only data, a “writable API” is a shorthand for one whose underlying data is openly writable

[snip]

To give very simple examples, an application could tag book objects to indicate that a user owns them or is reading them, could add users’ current page numbers, add links to the book elsewhere, or add any other metadata it pleases. Applications can also add tags (with values) to the author objects. These could indicate things like the author’s Twitter name, a link to their profile on LinkedIn, a measure of influence, a tag to show that the author is known by a user or is someone the user would like to meet, etc.

h. Mendeley … a free reference manager and academic social network

Many researchers use Mendeley to format citations as they’re writing papers, but what if you’re working on something a little less formal? Wouldn’t it be nice to be able to drop a few citations into a comment or web form or some other application that doesn’t have the tight integration that’s available with Word or Open Office? There are a couple quick ways to grab a formatted citation using Mendeley: use the “copy formatted” option in Mendeley Desktop, grab it from the page in the research catalog, or just drag it into your application.

i. Martin Mueller

Getting U ndergraduates and A mateurs into the B usiness of R e-editing our C ultural H eritage for a D igital W orld, Jan. 7, 2011

The Chicago section of today’s New York Times has an article with the title “Volunteers at Planetarium Excel where machines lag.” The gist of the article is in these paragraphs:

The Adler has become a leader in “citizen science,” a growing trend in astronomy research. As the lead institution of the Citizen Science Alliance, which includes Oxford and Johns Hopkins Universities, it has registered more than 350,000 non-experts to help classify the many thousands of pictures of galaxies taken by powerful telescopes.

The images can help researchers better identify the shapes of galaxies, observe the formation of stars and follow the movement of asteroids. Astronomers often use computers to help analyze photos of outer space, but computers can miss anomalies and patterns that the human eye is particularly equipped to catch, said Joshua Frieman, a researcher at FermiLab and a professor at the University of Chicago.

“You don’t need a lot of detailed astronomical training to be able to look at these images and answer certain basic questions about them,” Professor Frieman said.

For example, astronomers had assumed red galaxies were elliptical, but volunteers recently identified many of them as spirals. The amateurs’ observations have also helped scientists predict when solar magnetic storms, which often interfere with telecommunication satellites, will hit Earth.

Volunteers anywhere in the world can register online to view photos on their Web browsers, and send their observations to researchers at the planetarium.

This story has very powerful implications for thinking about similar forms of “crowdsourcing” in the humanities. The following is a riff on ideas advanced by Jerome McGann and Gregory Crane, two scholars who for the past two decades have been in the vanguard of Digital Humanities.

j. Peter Murray Amazon catalog updates

Did you know that Amazon offers a facility to make corrections to its catalog? Somewhere in the past few months someone mentioned this to me and I tried it out.

And it works! Is this a model for crowdsourced corrections to library data?

Here is how it looks from a user’s perspective.
[snip]

Now Amazon must have some resources backing up this service to do the verification of submissions. And it makes sense for them because corrected metadata makes it easier for their products to be found and purchased. If libraries were to consider providing an equivalent service for our metadata, could we justify the costs? Is this a good use of our time and effort?

If we were to do it, I think it might have to be done by a bibliographic utility like OCLC who has ways to push the updated records to member libraries. Otherwise we run the risk of diluting the corrections across many individual library catalogs. Interestingly, this sort of user-generated correction facility one that the Open Library already provides. (Open Library is a wiki-like service that offers the ability for anyone to make changes to its records, much like how anyone can edit articles on Wikipedia.) So between Amazon and Open Library there is a continuum of workflows of mediated corrections to unmediated corrections for us to consider. This scheme, of course, begs us to consider the notion of distributed version control systems for handling our bibliographic data so that changes can be merged across many sources.

k. Johan Oomen … Crowdsourcing in the C ultural H eritage D omain

… presented at the 5th International Conference on Communities & Technologies. The paper can be downloaded here and the authors welcome feedback on this ongoing research.

Abstract: Galleries, Libraries, Archives and Museums (short: GLAMs) around the globe are beginning to explore the potential of crowdsourcing, i.e. outsourcing specific activities to a community though an open call. In this paper, we propose a typology of these activities, based on an empirical study of a substantial amount of projects initiated by relevant cultural heritage institutions. We use the Digital Content Life Cycle model to study the relation between the different types of crowdsourcing and the core activities of heritage organizations. Finally, we focus on two critical challenges that will define the success of these collaborations between amateurs and professionals: (1) finding sufficient knowledgeable, and loyal users; (2) maintaining a reasonable level of quality. We thus show the path towards a more open, connected and smart cultural heritage: open (the data is open, shared and accessible), connected (the use of linked data allows for interoperable infrastructures, with users and providers getting more and more connected), and smart (the use of knowledge and web technologies allows us to provide interesting data to the right users, in the right context, anytime, anywhere – both with involved users/consumers and providers). It leads to a future cultural heritage that is open, has intelligent infrastructures and has involved users, consumers and providers.e

l. Praveen Paritosh … The A natomy of a L arge-scale H uman C omputation E ngine

… presented at the Human Computation Workshop 2010 and is available for download

In this paper we describe RABJ (Redundant Array of Brains in a Jar), an engine designed to simplify collecting human input. We have used RABJ to collect over 2.3 million human judgments to augment data mining, data entry, data validation and curation tasks at Freebase over the course of a year. We illustrate several successful applications that have used RABJ to collect human judgment. We describe how the architecture and design decisions of RABJ are affected by the constraints of content agnosticity, data freshness, latency and visibility. We present work aimed at increasing the yield and reliability of human computation efforts. Finally, we discuss empirical observations and lessons learned in the course of a year of operating the service.

more about RABJ on the Freebase wiki
a presentation from a recent Freebase Meetup

m. Resource Shelf

The “Social” Public Library Catalogue, Social Discovery Systems, and User Interaction

What follows are three presentations from a session at the Canadian Library Association Conference (CLA 2010) that took place earlier this month in Edmonton, Alberta.

These topics will likely be of interest to many of you. By the time you’ve reviewed the slides you’ll have a solid foundation about what social library catalogues and info discovery systems are all about, what they do and do not offer, a look at how a social catalogue is being used at a Canadian public library, and much more.

Bottom Line: If you’re not up to speed on social catalogues, some great intro material. If you’re already a social catalogue expert, we still pick up a few facts and ideas.

A Couple of Thoughts

“All things social” continue to explode and many experts say, this is only the beginning. Will social catalogues become a tool that are used by the masses or just a selected groups of users? Will this make a library catalogue in “popular” topics and less so in other areas? What does this mean over the long term for metadata associated with each record (it could potentially mean more, could it be better)? Can social catalogues be manipulated by users to promote items? Finally, can social data and traditional data that’s provided by many sources co-exist peacefully?

n. Karen Smith-Yoshimura A C rowdsourcing S uccess S tory

I’m a great fan of the National Library of Australia’s Trove, a single search interface to 122 million resources-books, journals, photos, digitized newspapers, archives, maps, music, videos, Web sites-focused on Australia and Australians. You can search the OCR’d text of over 45 million newspaper articles that have been digitized.

OCR is not perfect. The original document is juxtaposed with the OCR transcription so errors are immediately apparent. Since the Australian Historic newspapers public launch in July 2008*, people have been correcting errors in the OCR’d text. Both the corrected text and the original text are indexed and searchable.

The enthusiasm of these public text correctors is amazing! The 15 March 2011 Trove newsletter notes:

Text correctors are still doing an outstanding job of improving the electronically translated text, and the number of corrections each month continues to increase. In January we had over 2 million lines of text corrected in a month for the first time, which continued through February. The running total of corrected lines has now reached 31 million!

o. Tim Spalding LibraryThing

[ previous] [ end]

See also the following reference lists:

[ acronyms] [ agencies] [ meetings] [ vendors]