Introduction • CLIR

by Charles Henry

“Every day, we create 2.5 quintillion bytes of data-so much that 90% of the data in the world today has been created in the last two years alone.”

-IBM, Bringing Big Data to the Enterprise¹

This extraordinary and often cited statistic is an apt quantitative introduction to our technological era, increasingly referred to as the era of Big Data. The massive scale of data creation and accumulation, together with the increasing dependence on
data in research and scholarship, are profoundly changing the nature of knowledge discovery, organization, and reuse. As our intellectual heritage moves more deeply into online research and teaching environments, new modes of inquiry emerge; digital
data afford investigations across disciplinary boundaries in the sciences, social sciences, and humanities, further muddling traditional boundaries of inquiry.

How then are we responding to what may be the most complex and urgent contemporary challenge for research and scholarship? With considerable difficulty, as the two reports in this volume attest. The key focus of these reports-”The Problem of Data: Data
Management and Curation Practices Among University Researchers,” by Lori Jahnke and Andrew Asher, and “Data Curation Education: A Snapshot,” by Spencer Keralis-is data curation, a term generally defined as a set of activities that includes the preserving,
maintaining, archiving, and depositing of data to keep it secure, intact, and accessible for reuse. The term can also comprise the conceptualization and creation of digital objects. In this respect, data curation encompasses the life cycle of data
from their inception to their reuse to their transformation into new knowledge products.

Two phenomena compound the challenge of data curation. First, although the stewardship of digital data demands both general and domain specialist knowledge, there are currently no effective ways to prepare people for that hybrid role. Still a developing
practice, digital curation has thus far drawn individuals with varied professional experience; many have had no specialist training in the disciplines that they now serve. According to the Digital Curation Centre in Edinburgh, the result is “a shortage
of experienced data scientists and curators with digital preservation experience.”²

The second phenomenon compounding the challenge is the lack of conformity among the places of practice. Libraries, data centers, academic departments-all organizations where data curation can be done-have varied, sometimes idiosyncratic, approaches and
often entail different attitudes, cultures, and practices. New government requirements for exposing and managing federally funded research data add urgency to the challenge of curating data.

These two reports address each of these circumstances in depth. Jahnke and Asher explore workflows and methodologies at a variety of academic data curation sites, and Keralis delves into the academic milieu of library and information schools that offer
instruction in data curation. Their conclusions, while not surprising, nonetheless point to the urgent need for a reliable and increasingly sophisticated professional cohort to support data-intensive research in our colleges, universities, and research
centers. We will need more innovative approaches to recognize, educate, promote, and retain those individuals who evidence the complex skill sets required for the demands of data curation. At the same time, we will need to foster and facilitate a
greater coherence of practices, standards, and protocols among the various data sites.

CLIR and the Digital Library Federation have received a major grant from the Alfred P. Sloan Foundation to develop the cohort needed and to help instantiate best practices and shared methods across data curation centers. The grant, made in response to
the findings of the reports that follow, could not be more timely. As the recently published report, One Culture, asserts, we are now confronted with a new paradigm: a digital ecology of data, algorithms, metadata, analytical and visualization
tools, and new forms of scholarly expression.³ The implications of this digital milieu for the practices of research, teaching, and learning, as well as for the
economics and management of higher education, should be of profound interest not only to researchers engaged in computationally intensive work, but also to college and university administrations, scholarly societies, funding agencies, research libraries,
students, and academic publishers.

In this respect, we are only just getting started.

FOOTNOTES

¹http://www-01.ibm.com/software/data/bigdata/

²http://www.dcc.ac.uk/about-us/dcc-charter.

³ Williford, Christa, and Charles Henry. 2012. One Culture: Computationally Intensive Research in the Humanities and Social Sciences. A Report on the Experiences of First Respondents to the Digging Into Data Challenge. Washington, DC:
Council on Library and Information Resources. Available at https://www.clir.org/pubs/reports/pub151.

[ contents ] [ previous ] [ next ]