by Spencer D. C. Keralis

“We believe professionals in all fields need a richer understanding of how their professions and the materials they work with are being transformed by the emergence of the digital information ecosystem.”

-Peter Boticelli et al., “Educating Digital Curators:
Challenges and Opportunities”

This study provides a snapshot of the current digital data curation education landscape. Because the field is rapidly changing in response to several factors-an increasingly demanding job market, the needs of researchers who must cope with data management planning mandates from national funding agencies, and the perceived “data deluge” that threatens to overwhelm the research and library communities in terms of technology, infrastructure, and staffing-this snapshot is necessarily limited in scope and marks a specific moment in time.

The study has three main goals:
1.    To describe how library and information science (LIS) programs address digital data curation as a component of their curricula for librarians
2.    To describe the extra-academic training curricula developed by scholars and professionals to address unmet needs within their communities
3.    To use this information to make recommendations for training curriculum development for future CLIR fellows

For the purpose of this discussion, digital data curation is best described as life cycle data management; it encompasses a spectrum of activities ranging from research data management planning at the project inception stage; through collection of data as part of the research process; through the identification, processing, and accession of data sets; and, finally, to the archival preservation and sharing of data in an appropriate repository. The term data in this context refers to “everything needed to have reproducible science” (Woods Hole Oceanographic Institution 2012). Although in the present discussion these concepts are concerned primarily with the sciences and social sciences, they are applicable across disciplines for any research that relies on or generates data.


Those in the LIS field perceive data curation as an intrinsic part of their discipline. Data curation education efforts are most often embedded in standard LIS courses (for example, as components or modules of metadata and database architecture courses), and efforts to teach data curation as a discrete set of intelligible practices are both recent and few. Currently, only five LIS schools offer graduate certificates explicitly in data curation. These tracks are part of programs that lead to a master’s degree in library and information science (MLIS), with the certificate requirements distributed over the progression of the two-year program, and are generally not open to non-LIS students or professionals.

These programs, isolated within the standard LIS curriculum or within certificate programs that are exclusive to LIS students, are not designed to meet the needs of researchers or professionals who may benefit from these skills. Furthermore, researchers’ perception of libraries as “a dispensary of goods … rather than a locus for real-time research/professional support” compromises the ability of those in the LIS field to intervene effectively in campus research activities and may even foreclose collaboration with other disciplines (Jahnke and Asher 2012, 4).1 As Weber and associates note in their report on the 2010 Data Curation Research Summit, “LIS will need to develop stronger partnerships with domain researchers, informaticists, and other stakeholders in the research enterprise, to succeed at making research data an integral and enduring part of the information assets retained for science and scholarship over the long term” (Weber et al. 2011, 6).

The most valuable intervention to come out of the LIS field for the purposes of digital data curation education is the development of a matrix of skills and functions by Cal Lee at the University of North Carolina at Chapel Hill. The DigCCurr Matrix describes 24 functional areas and 4 meta-level functions (Lee 2009). These are broad, high-level categories, designed to address “digital curation ‘know how,’ as opposed to the conceptual, attitudinal or declarative knowledge.” Defining these skills potentially makes it possible to develop a modular, skills-based curriculum that can be customized for different skill levels and functional concentrations.

Research conducted by Virgil Varvel and associates at the University of Illinois at Urbana-Champaign as part of the Data Conservancy project demonstrates the difficulty of identifying data curation tracks within LIS curricula. Using a keyword search based on concepts in the DigCCurr Matrix to survey “online course catalogs and websites of 63 iSchools and other LIS schools,”these researchers uncovered “475 courses in 158 programs at 55 schools” (Varvel, Bammerlin, and Palmer 2012). The net cast by this project was wide, as the researchers included introductory LIS courses containing foundational knowledge that may be developed in later courses (although the results published thus far do not indicate whether the researchers attempted to make such connections between courses to see if this was borne out within individual curricula) and “exceptions were made if information was ambiguous, to err on the side of inclusion” (528).

The study broke out four categories of courses:
1.    Data-centric-“courses were focused exclusively on data curation, data management, or data science topics” (8 percent)
2.    Data-inclusive-“courses have segments devoted to data topics related to e-science or e-research” (11 percent)
3.    Digital-“courses did not appear to explicitly attend to research data expertise, they included digital topics that are highly relevant for education of data professionals” [emphasis added] such as “digital library development” or “digital preservation or digital collections and services” (27 percent)
4.    Traditional LIS-courses that “give students an introduction to important topics developed further in data inclusive or data centric courses” (54 percent)

The Data Curation Curriculum Search tool developed through the research of Varvel and associates does not allow a search based on these categories, and these categories do not appear as descriptors in individual course records within the tool. As a result, it is impossible with the information available publicly to provide examples of each for further examination.

Data Conservancy researchers claim that the percentage distribution among the course categories “indicat[es] a high level of coverage of at least some aspects of data expertise” [emphasis added]. However, more than half of the courses identified in the study are “traditional LIS”-the most ambiguous category and the one that the researchers allowed themselves to most “err on the side of inclusion.” More than one-quarter of the courses identified fall into the digital category, but while these courses include skills that may in some ways be transferable to the data curation environment, they do not explicitly address the needs of data-intensive research. Thus, 81 percent of the courses identified require some evaluation before they can become part of a curriculum for data curation professionals, while less than 10 percent are specific to the state of education in data curation.

Given the apparent improbability that students will encounter a data-centric course in their line of study, it seems that students must already be well versed enough in the language of data and the needs of researchers to evaluate course descriptions, must be committed to constructing a data-intensive education for themselves, or must have an advisor knowledgeable enough to help them craft a track from traditional LIS courses in order to come out of most existing U.S. LIS programs with the skills and knowledge necessary to support the needs of data-intensive research.

Current Data Curation Certificate Programs

The United Kingdom’s Digital Curation Centre (2012a) identifies five data management certification programs in the United States (table 1). Each of these programs restricts its enrollment to LIS students, with the exception of the University of Arizona’s DigIn! Program, which admits post-baccalaureate students and professionals who are not enrolled in Arizona’s MLIS program.

Institution Program Mode URL
University of Arizona Graduate Certificate in Digital Information Management* Distance
University of California at Berkeley Master of Information Management and Systems Residential
University of Illinois at Urbana-Champaign Data Curation Education Program (DCEP) Residential
University of North Carolina at Chapel Hill DigCCurrI (master’s students); DigCCurrII (doctoral students) Residential
San Jose State University Master’s Degree in Archives and Records Administration (MARA) Distance

Table 1. Data management certification programs in the United States
* The development of the University of Arizona program is described by Peter Botticelli et al. (2011).

Varvel and associates, in their research for the Data Conservancy,  identify a larger pool of certifications that may be applicable to data curation (Varvel, Bammerlin, and Palmer 2012). They “identified 7 master’s degree programs, 4 certificate programs, and 10 other concentrations with a specific emphasis on data in their descriptions at 17 different institutions.” However, they point out that some of these programs are data-in-name-only: Even though they have “data” in their descriptions, they included few data-centric or data-inclusive courses-a fact that seems to undercut the optimism expressed by the researchers about the potential for these programs to produce data professionals. (Unfortunately, Varvel and associates do not call out these programs by name.)

Emerging Data Curation Certificate Programs

There are several digital curation certificate programs under development at institutions around the United States, but two programs are of particular interest to this study.

The first is at the University of North Texas iSchool, which is developing a Graduate Academic Certificate in Digital Curation and Data Management. This program will be open to non-LIS students and to non-student professionals from the sciences and social sciences, computer science, and the humanities, as well as to LIS master’s and doctoral students. The curriculum will be a modular grouping of non-residential online courses, but will require onsite capstone sessions with LIS faculty. A pilot version of the initial course, Cyberinfrastructure Fundamentals for Digital Curation and Data Management, will launch in the summer of 2012 (University of North Texas 2011).

The second program of interest is a partnership between the Purdue University Libraries and the libraries of Cornell University, the University of Minnesota, and the University of Oregon. This program will “develop a training program in data information literacy for graduate students who will become the next generation of scientists.” At each institution, teams of librarians and experienced researchers will develop “a shareable data information literacy training curriculum for students in science/engineering graduate programs” (Institute of Museum and Library Services 2011). The outcomes of these parallel development efforts will be evaluated and shared online for the use of other libraries.

In the emerging programs identified so far, the trends are toward allowing open enrollment for scholars and professionals outside the LIS discipline and toward developing more collaborative models of teaching and learning that partner librarians and LIS educators with research faculty. In some cases, the digital data curation certificate program is not based in the LIS school at all; at the University of Maine, for example, the New Media Studies program will host the interdisciplinary Digital Curation Graduate Certificate. Museum studies programs are also beginning to offer digital curation certificates that address the specific needs of museums in identifying, preserving, and providing access to digital artifacts, born-digital art, and other assets (Pratt Institute 2012).2 Table 2 includes a few of the certificate programs under development; this record is far from comprehensive, however. As of the 2011 funding cycle, the Institute of Museum and Library Services (IMLS) had awarded more than $9 million to data curation education and capacity building, indicating a commitment to developing data expertise further in LIS professionals.

Institution Program Funder Launch Date Enrollment
Pratt Institute Project CHART! (Cultural Heritage Access Research and Technology) IMLS Fall 2012 LIS only
Purdue University Next Generation Scientists IMLS Fall 2012 Open
University of Maine Digital Curation Graduate Certificate Unknown Fall 2012 Open
University of North Texas Graduate Academic Certificate in Digital Curation and Data Management IMLS Pilot begins Summer 2012 Open

Table 2. Sample data curation certificate programs under development

Other emerging educational efforts in data curation do not involve an academic certificate. Rather, they move toward embedding LIS students in research environments. In 2010, the Graduate School of Library and Information Science (GSLIS) at the University of Illinois at Urbana-Champaign Illinois received a $988,543 Laura Bush 21st Century Librarian Program grant from IMLS to develop “a sustainable and transferable model for educating library and information science master’s and doctoral students in data curation through field experience in research and data centers.” The Data Curation Education in Research Centers (DCERC) program involves a partnership between the GSLIS, the National Center for Atmospheric Research (NCAR), and the University of Tennessee, School of Information Sciences. This model is valuable in that it embeds students in research and data centers, but the program is open only to enrolled master’s and doctoral students in the iSchool at Illinois and the University of Tennessee, School of Information Sciences.


Several extra-academic programs provide potential models for training postdoctoral scholars in digital data curation. Some of these programs originated in the efforts of LIS schools to address the needs of professionals, while others have emerged from groups of professionals seeking to fill in the gaps in their training and to build communities of practitioners with similar interests and needs.

DigCCurr II Professional Institutes

The DigCCurr program at the University of North Carolina at Chapel Hill offers annual professional institutes “aimed at assisting digital collection managers in developing their digital curation strategies” (DigCCurr 2012). The program began in 2009 and has been held every year since then. Each institute includes a spring program with a winter follow-up session and public symposium.

Digital Preservation Outreach and Education

The mission of Digital Preservation Outreach and Education (DPOE), an initiative of the Library of Congress, is “to foster national outreach and education to encourage individuals and organizations to actively preserve their digital content, building on a collaborative network of instructors, contributors, and institutional partners” (DPOE 2012).

From September 20-23, 2011, the DPOE Baseline Train-the-Trainer workshop was held at the Library of Congress. Developed in partnership with Nancy Y. McGovern of the Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan, the DPOE Train-the-Trainer Workshop for digital data preservation provides attendees with a basic digital data preservation curriculum, as well as with “tips and techniques for conducting successful workshops.” The workshop consists of six modules:
1.    Identify: What digital content do you have?
2.    Select: What portion of that content is it your responsibility to preserve?
3.    Store: How should digital content be stored for the long term?
4.    Protect: What steps need to be taken to protect your digital content?
5.    Provide: How should digital content be made available?
6.    Manage: What provisions should be made for long-term management?

Digital data preservation educators teach the workshops. Graduates of the program are able to offer the workshops at their home institutions for researchers and practitioners within their region. The focus is on preservation rather than life cycle data management, and participants are expected to have a fairly significant technical background prior to participating in the workshop.

Digital Curation Centre

Describing the organization as “the UK’s leading hub of expertise in curating digital research data,” the Digital Curation Centre (DCC) website is a clearinghouse of information for practitioners seeking advice or resources on data management. The DCC also offers workshops in data management, including Data Curation 101, a three-day intensive course for data custodians. For beginners, DC101 Lite distills the information in DC101 into a half-day course. The courses are structured around the DCC Curation Lifecycle Model 1. Unlike the DPOE model, which focuses on preservation, the DCC model addresses the full range of issues in digital data curation (DCC 2012b). The course materials are available online to share and reuse.

The DCC also offers a train-the-trainer program, which makes the generic DC 101 and DC 101 Lite training materials available for use “as the basis for disciplinary or institutional-specific training.”


CURATEcamp is a series of “unconference” events for digital data curation practitioners. The camps deliberately include a wide range of practitioners, recognizing that “digital curation is a practice that happens all over: libraries, archives, public media, industry, start-ups, non-profits, government, and so forth.”

The attendees at these unconferences set the agenda of each camp, though often with a pre-described theme or concept in mind. For example, the October 2011 CURATEcamp that occurred in conjunction with the Digital Library Federation (DLF) Forum had the theme “Catalogers and Coders” and brought together metadata specialists and technologists “to engage in interactive problem solving and exploration of topics of joint interest, especially in the area of Linked Data.”

Although CURATECamps are no doubt useful as forums for the exchange of information and ideas, perhaps their most valuable function is the creation of diverse communities of practitioners who are confronting similar issues in a wide range of disciplines.

A Note on Certification

Each of these extra-academic training models offers its participants an opportunity to develop particular skills and knowledge, and in some cases, participation carries a certain cachet for those familiar with the programs. Institutional alignment can also convey credibility; for example, the DPOE program bears the imprimatur of the Library of Congress. However, none of the models can deliver industry standard or academically recognized accreditation or certification. Participants can supplement their experience in these programs with software or other industry certifications, but accreditation and certification would be the strongest incentives for participants to invest the time and make the financial commitment required for academic programs.


Although the IMLS is investing heavily in data-oriented education in the LIS field, and LIS and iSchool programs are making efforts to develop data curation curricula, much work still needs to be done to prepare LIS graduates for roles as data professionals in and out of libraries. Furthermore, the LIS world largely remains a closed circuit, providing concentrations within tracks restricted to LIS enrollees. The trend in emerging curriculum development programs is to open up this closed circuit and allow post-baccalaureate students and professionals to take courses in data curation; this trend can only strengthen the LIS programs and those professionals taking part in them. Data curation is not a single-discipline practice, and developing programs that include professionals and students from across the natural, social, computer, and information sciences, and the humanities will help produce practitioners who are better prepared to meet the needs of data-intensive research.

The Council on Library and Information Resources (CLIR) Postdoctoral Fellowship in Academic Libraries is a proven model for preparing doctoral scholars for service in academic libraries. CLIR’s weeklong “library bootcamp” introduces fellows to some of the issues facing twenty-first century libraries, creates a cohort of fellows who can share experiences and information, and helps realign the newly minted Ph.D.s in relation to the academy. Host institutions benefit from library-friendly scholars who are able to work intensively on both service and research initiatives within the libraries.

In 2012, the DLF program of CLIR received a $679,827 grant from the Alfred P. Sloan Foundation to help launch the new CLIR/DLF Data Curation Fellowship Program. The program, an expansion of CLIR’s Postdoctoral Fellowships in Academic Libraries, will provide recent Ph.D.s with professional development, education, and training opportunities in data curation for the natural and social sciences. For these fellows, the CLIR bootcamp model will be expanded and adapted to include an additional skills-based practicum that will introduce fellows to the terminology, tools, and issues they will face in their positions. Library and LIS professionals will be recruited to provide the training.

The experience gained during the two-year postdoctoral fellowships will encourage the development of highly skilled and knowledgeable specialists. The aim is to create a cadre of scholarly practitioners who understand not only the nature and processes of their own disciplines but also the ways in which their research data are organized, transmitted, and manipulated. For the program’s first cohort, CLIR is now recruiting six data curation fellows in cooperation with its partner institutions: Indiana University, Lehigh University, McMaster University, Purdue University, the University of California at Los Angeles, and the University of Michigan.

1 Although this perception is a commonplace complaint among academic librarians, the anthropological portion of this project may well be the first time this has been formally documented as a phenomenon, and may merit further study.

2 For more on digital curation curricula in museum studies, see Tibbo and Duff (2008).

