This study afforded opportunities to spend extensive time talking to the eight international teams of researchers supported by the first Challenge. CLIR conducted interviews with multiple participants in each project, and site visits with one of the institutional partners from each of the eight groups. Although the number of projects was small, the range of disciplines they represent are diverse, including classical and medieval literatures, British and American histories, the history of European cartography, nineteenth and twentieth century American quilt-making, twentieth century popular music, the study of British and American speech, the application of advanced computer algorithms for pattern recognition in digital images, and more. Each project described below serves as a touchstone for considering the changing nature of research in its particular domain(s), and what directions it these fields might take in the future.
Collaborations were of varying sizes, and the extent to which partners’ work was interdependent also varied. The common interests among the partners also ranged broadly; whereas for some there was a high degree of shared expertise in a particular subject, discipline, or theoretical approach, for others the basis for collaboration was the establishment of a shared, more discipline-agnostic, methodology. In some collaborations certain individuals contributed expertise in both a subject domain and the development of tools for data analysis, but in other cases these kinds of expertise were contributed by separate individuals representing quite different backgrounds and perspectives. In every case, however, principal investigators of the funded projects were highly experienced, well known scholars with a history of successful collaborations involving advanced technology, in many cases with the same partners with whom they worked on their Challenge projects. The vast majority of principal investigators held secure, senior-level, permanent positions, although the contributions of junior scholars, graduate students, and even undergraduates were without exception vital to project success.1 Notably, all the Digging Into Data projects were extensions of prior work requiring major investments of time and money in the preparation of data and analytical tools.
This relative maturity, at least when compared to many other e-research initiatives, is hardly surprising. Because the focus of the Digging Into Data program was on questions and methodologies rather than on the creation and maintenance of data corpora, there was an underlying assumption that respondents to the Challenge would build upon their own or others’ work in amassing data that would be both significant enough and reliable enough to be meaningfully queried. At the same time, in practice the availability and reliability of the data upon which the work depended varied greatly: these differences necessitated differences in approach for the researchers and to a large extent determined what kinds of research outcomes were possible during the brief grant period. While some work relied upon highly structured data, other efforts were built upon unstructured or raw digital files of multiple types, or harvested or produced specifically for the purposes of the project. We present our case studies along a rough continuum,2 from work focused upon the most uniform and organized kinds of data to those built upon less structured, more abstract, and heterogeneous forms. This ordering is necessarily subjective and is by no means an indicator of the relative value, impact, or success of these efforts.
The Eight Case Studies
- Using Zotero and TAPOR on the Old Bailey Proceedings: Data Mining with Criminal Intent (DMCI)
- Digging into the Enlightenment: Mapping the Republic of Letters
- Towards Dynamic Variorum Editions (DVE)
- Mining a Year of Speech
- Harvesting Speech Datasets from the Web
- Structural Analysis of Large Amounts of Music Information (SALAMI)
- Digging into Image Data to Answer Authorship Related Questions (DID-ARQ)
- Railroads and the Making of Modern America
1All principal investigators were men; however women researchers did play key roles in several of the projects. At the conclusion of the Challenge, investigators expressed concern at the gender imbalance at the level of project leadership. This topic merits deeper exploration. For the second round of the Digging into Data Challenge funded in December 2011, nine of the fourteen funded projects have a woman as a principal investigator.
2Some of the factors we considered in determining this order: data consistency, reliability, and completeness; the existence and reliability of metadata; the homogeneity of data types incorporated into the project; the proportion of text vs. non-text data; uniform vs. multi-layered analyses of data.