Digging into Image Data to Answer Authorship Related Questions (DiD-ARQ)
Digging into Image Data to Answer Authorship Related Questions (DiD-ARQ) is founded on arguably the most audacious “What if…?” proposition of all of the inaugural Digging into Data initiatives: in brief, what can we learn about authorship if we apply the same bundle of advanced image analysis algorithms across diverse, otherwise unrelated, digitized collections of anonymous and corporately created works of art? Formally characterizing the construct of an “authorship related question” is a task comparable to the challenge of digging into the data. Identifying the most salient questions required careful thinking on the part of humanities experts about what constituted meaningful authorship related questions in their domain, without overlooking new, compelling directions.
The corpora selected for the project included high resolution page images of fifteenth-century illustrated manuscripts, a select group of seventeenth, eighteenth, and early nineteenth century maps of North America, and more than 50,000 low-resolution digital photographs of nineteenth and twentieth century American quilts; taken together, these archives hold rich potential for exploring the histories of literature, art, culture, technology, science, and ecology as well as for advancing the science of adaptive image analysis.
DiD-ARQ’s three collaborating teams comprised one of the largest groups of individuals involved in any of the Digging into Data projects and necessarily incorporated experts involved in the creation of the three corpora, the diverse scholarly domains concerned with studying those archives, as well as scientists adept at varied kinds of image analysis. The three teams, based at Michigan State University (funded by NEH), University of Illinois (NSF), and the University of Sheffield (JISC), as well as the diverse experts represented in each of those teams, brought a different set of questions to the project; these questions were just as varied as the manuscripts, quilts, and maps with which they are working.
Our notions of what constitutes a viable research question have changed over time, and the humanities, social sciences, and sciences naturally have divergent understandings of the formality, originality, specificity, and breadth of questions appropriate to projects commonly undertaken in those traditions, whether they be conference presentations, single-authored journal articles, monographs, fieldwork, or lab-based research. “Mash-up” projects that bring together experts trained within very different traditions face special challenges when it comes to framing their collaborations. At the level of the overall collaboration, the motivating question frames the project along methodological, rather than disciplinary lines, e.g. “How accurate and scalable are adaptive image analysis methodologies when they are applied to diverse collections of image data?” At the domain level, the questions are equally broad but focus on bounding more specific topics within the subject areas represented by the three collections. The three teams diverged at this level to focus on (1) the characteristics of different scribal and artistic hands represented in ten fifteenth-century illustrated manuscripts, (2) the variety and accuracy of 17th-19th century French and English cartographic depictions of the Great Lakes, and (3) the relative popularity or uniqueness of specific color and pattern choices for nineteenth and twentieth century American quilts.
It is at a third, experimental level that they saw the strongest benefits of the collaboration, in terms of methods, results and new working hypotheses about authorship. Here the teams were able to select from a common set of image analysis tools to help them frame more specific questions relevant to each topic area [See figure below]. For example, shape analysis algorithms developed by the team enable comparisons of illuminated manuscripts as well as geographical shapes across maps.
“Illustration of the need to share software that can perform model-based segmentation of images. The same segmentation algorithm is applied to finding armor in Froissart’s manuscripts and lakes in historical maps.” (Simeone et al.)
The Sheffield team experimented with shape to identify a “digital fingerprint,” first, for the unique scribes’ hands known to have contributed to the copying of the manuscripts. While at the project’s conclusion the team was still awaiting the results of their calculations with reference to the characteristic features of the scribes’ hands found in the manuscripts, workshops held in the last quarter of 2011 enabled them to pinpoint key formal criteria which, it is anticipated, will help the team move towards establishing “a progressively more objective digital fingerprint for some of our hitherto rather shadowy medieval copyists.” This work is ongoing and has most recently been refined and enriched by the practical insights of palaeographers and a professional calligrapher (Colin Dunn, who also photographed the digital corpus).
The humanists working in cartography and climatological history at the University of Illinois used a segmentation process and algorithm for estimating scale to extract and compare depictions of the Lakes across forty different maps. To date, results clearly show that some of the lakes were more consistently mapped than others during this period, leading credence to the theory that extreme weather in some locations may have prevented cartographers from obtaining accurate measurements. Some of the analysis suggests “continuities between and within the [British and French] national traditions,” but the team considers these notions to be tentative pending further research. Despite the need for further investigation, from a methodological perspective the DiD-ARQ cartographers consider the project to be a major step forward in historical climatology.
“Climatologists know that historically water levels of the Great Lakes have varied significantly, but they have not yet included differing levels in climate models because data before the late 19th century is almost nonexistent. Because the segmentation algorithm may offer a way to estimate water levels and persistent ice cover…, we can begin to analyze maps prior to 1800 in order to provide usable data for historical climate models and future projections. Without Digging into Data funding for the segmentation algorithm and the quantitative analysis of data from forty maps of the Great Lakes between 1650 and 1800, we would not have been able to imagine the questions we’re asking, let alone provide useful data. Our preliminary reports have been well-received by climatologists, historians, historians of science, and literary scholars, so we believe we have begun to open a new avenue of productive and timely research.” –Robert Markley
The quilt historians working on DiD-ARQ used color space conversion to track the popularity of a particular shade of blue fabric across their quilt corpus, as well as a newly developed pattern recognition algorithm specifically tailored to measure the degree of symmetry in a design, or, in quilt parlance, to identify “crazy” quilts – an asymmetrical and highly embellished quiltmaking style that flourished in the United States from the 1880s until the 1920s. The algorithm was designed to assess the degree of regularity or divergence between compared regions of the quilt and distinguish highly irregular “crazy” quilts from their more regular counterparts. Intriguing “false positives” identified by the “craziness” algorithm categorized significant numbers of quilts as being visually related to crazy quilts. These results suggest new directions for textile researchers as well as for digital humanities work. For scholars, there may be antecedents to the development of the crazy quilt tradition that have not yet been identified by researchers, as well as influences of the “crazy quilt” movement on subsequent visual arts that valued disorder and juxtaposition over symmetry and harmony. Further analysis of the large set of results is needed as well as new tools to facilitate interdisciplinary collaboration and iteration on results of computational analysis of data sets, particularly in a manner that can incorporate the participation of citizen scholars. These new directions and working hypotheses emphasize the complex relationship of authorship to so many dimensions of quilting: history, design, technology, etc. The relationships require explication and definition, to help experts understand what kinds of investigations are best pursued by authorship studies. These results suggest that authorship has the potential to be a canonical window through which humanists pursue a variety of questions. Obviously the potential rewards and risks need to be better understood before such epistemic status can be conveyed on authorship related questions and the investigations they might inform.
“While there have been numerous insights into the cultural history…of our individual datasets, the most significant findings of DiD-ARQ for the general audience is the applicability of computer algorithms for humanities analysis–no matter the image dataset. We have clearly illustrated that computer-assisted analysis can reveal new areas of exploration for humanists and that we can provide clarity regarding the underlying rationale of why [images] are interpreted in particular ways. For the general public, this means that we can more easily illustrate the value of humanities to discussions of things like how space, place, or race are depicted and remembered.” –Jennifer Guiliano
The size, scope, and complexity of this project’s network of nested research questions, not to mention the iterative way that these questions have necessarily developed and changed over time, have prompted the participants to adopt a very formal approach to their collaboration. They drafted a memorandum of understanding at the outset that outlined the responsibilities of each of the three teams as well as established practices for communication, for data, hardware, and software sharing, and guidelines for citation practice and credit sharing in project publications. Michael Simeone, Jennifer Guiliano, Rob Kooper, and Peter Bajcsy report some of the details of the project’s organization in a May 2011 issue of First Monday, including the reasons that supported their decisions to approach the project as they did. Weekly recorded web conferences, an email discussion list, and a secure, shared storage server were critical to daily project operations. While free open source, commercial, and institutional applications of these kinds are widely available and familiar to many researchers, the choices of what tools to adopt for a collaboration, the authors argue, are critical determinants of a project’s ultimate success, and deciding upon the appropriate level of openness, authentication, and functionality of tools to be used by all team members can be a complicated negotiation. The establishment of and adherence to rules for project documentation and communication helped the large numbers of scholars and students involved in the project engage with one another’s work, kept morale high, and bonds of trust among the collaborators strong throughout the ongoing project. Researchers indicated that they believed their experience on the DiD-ARQ project would be an important influence on the ways they planned and organized their future research.
Core Participants involved in all project elements
- Peter Ainsworth (University of Sheffield, UK) served as Principal Investigator for the JISC-funded portion of the collaboration as well as contributed subject and technical expertise as Director of the Online Froissart project.
- Simon Appleford (University of Illinois Urbana Champaign, US) is a cultural historian and digital humanist based at the Institute for Computing in Humanities, Arts, and Social Science (I-CHASS) at the University of Illinois. He contributed as a subject specialist to the project.
- Peter Bajcsy (formerly University of Illinois Urbana Champaign, now National Institute of Standards and Technology, US) was the founder and leader of the Image Spatial Data Analysis Group at the National Center for Supercomputing Applications, University of Illinois, and led project planning and served as co-Principal Investigator for the NSF-funded portion of the project.
- Steve Cohen (Michigan State University, US) is an evaluation specialist who helped with project assessment throughout the grant.
- Matthew Geimer (Michigan State University, US) is a computer scientist who contributed technical and analytical expertise to the project.
- Jennifer Guiliano (formerly University of Illinois Urbana Champaign, now Assistant Director for the Maryland Institute for Technology in the Humanities, University of Maryland) served as project manager for the NSF-funded portion of the grant and also contributed subject expertise as a cultural historian and digital humanist.
- Rob Kooper (University of Illinois Urbana Champaign, US) is a computer scientist and Senior Research Programmer for the Image Spatial Data Analysis Group at the National Center for Supercomputing Applications. He served as co-Principal Investigator for the NSF-funded portion of the project.
- Michael Meredith (University of Sheffield, UK) contributed computer science expertise and served as developer for the JISC-funded portion of the project.
- Dean Rehberger (Michigan State University, US) is Director of MATRIX, the Center for Humane Arts, Letters, and Social Sciences Online at Michigan State University and History Adjunct Curator of the MSU Museum and served as Principal Investigator for the NEH-funded portion of the project and contributed subject expertise in the digital humanities generally as well as expertise specific to his involvement with the Quilt Index.
- Justine Richardson (Michigan State University, US) served as project manager for the NEH-funded portion of the project based at MATRIX, Michigan State University. She also contributed subject expertise in cultural history and digital humanities as well as expertise specific to her involvement with the Quilt Index.
- Michael Simeone (University of Illinois Urbana Champaign, US) contributed as a subject expert in historical cartography as well as served as project manager for the NSF-funded portion of the project based at the Institute for Computing in Humanities, Arts, and Social Science (I-CHASS), University of Illinois.
Contributing additional expertise in computer science
Wayne Dyksen (Michigan State University, US)
Alhad Gokhale (Independent Researcher)
Zach Pepin (Michigan State University, US)
William Punch (Michigan State University, US)
Tenzing Shaw (University of Illinois Urbana Champaign, US)
Contributing additional expertise in quilt making and quilt history
Beth Donaldson (Michigan State University Museum, US)
Amy Milne (Alliance for American Quilts, US)
Marsha MacDowell (Michigan State University and MSU Museum, US)
Amanda Silkarskie Michigan State University, US)
Mary Worrall (Michigan State University Museum and Quilt Index Project, US)
Other consulting quilt experts
Karen Alexander, Barbara Brackman, Janneken Smucker, Merikay Waldvogel, Jan Wass and members of the American Quilt Study Group email discussion list.
Contributing art historical and other expertise related to medieval manuscripts
Heather Tennyson (University of Illinois Urbana Champaign, US)
Colin Dunn (Scriptura Limited, University of Oxford, UK)
Godfried Croenen (University of Liverpool, UK)
Caroline Prud’homme (University of Toronto, Canada)
Victoria Turner (University of Warwick, UK)
Anne D. Hedeman (University of Illinois Urbana Champaign, US)
Natalie Hanson (University of Illinois Urbana Champaign, US)
Contributing expertise in historical cartography and environmental literatures
Robert Markley (University of Illinois Urbana Champaign, US)
Ainsworth, Peter and Meredith, Michael. “Breaching the Strongroom: a Pervasive Informatics Approach to Working with Medieval Manuscripts,” Proceedings of the KMIS 2011 International Conference on Knowledge Management and Information Sharing, Joachim Felipe and Kecheng Liu, eds. 2011: Setúbal, Portugal. pp. 264-71. ISBN 978-989-8425-81-2.
Ainsworth, Peter. “Digital Attraction: from the real to the virtual in manuscript studies,” Forum : University of Edinburgh Postgraduate Journal of Culture & The Arts, issue on Authenticity (May 2011), 14 p. http://www.forumjournal.org/site/issue/12/peter-ainsworth
Simeone, Michael, Jennifer Guiliano, Rob Kooper, and Peter Bajcsy. “Digging into data using new collaborative infrastructures supporting humanities-based computer science research.” First Monday 16.5 (2 May 2011).
Presentations and posters
Ainsworth, Peter, Presentation of the DID and Online Froissart projects, seminar on “Temporality and Value at the Intersection of the Arts and Humanities,” University of Southampton, UK, 12 April 2012.
Bajcsy, Peter. Presentation at Wolfram Technology conference in IL; October 13, 2010, http://www.wolfram.com/events/techconf2010/speakers.html.
—. Presentations at Imaging at Illinois workshop in IL, October 14-15, 2010, http://www.imaging.beckman.illinois.edu/imaging2010/.
—. Presentation at the Gordon Challenge in Data-Intensive Discovery conference in CA, October 26-29, 2010, http://www.sdsc.edu/gordongrandchallenge/.
—. Presentation at the Supercomputing Conference 2010, NSF funded panel on Grand Challenges in Humanities, Arts and Social Sciences, New Orleans, Louisiana , November 14-16, 2010; http://sc10.supercomputing.org/schedule/event_detail.php?evid=stpan108.
—, Rob Kooper, Luigi Marini, Tenzing Shaw, Jennifer Guiliano, Anne D. Hedeman, Robert Markley, Michael Simeone, Natalie Hanson, “Supporting Scientific Discoveries to Answer Art Authorship Related Questions Across Diverse Disciplines and Geographically Distributed Resources,” Microsoft Research eScience Workshop, October 11–13 in Berkeley, CA, http://research.microsoft.com/en-us/events/escience2010/default.aspx (accepted as poster August 2010)
— and Maryam Moslemi, “Discovering Salient Characteristics of Authors of Art Works,” IS&T/SPIE Electronic Imaging, 17 – 21 January 2010, San Jose Convention Center, Section – Computer Vision and Image Analysis of Art, Paper 7531-10 presented on January 18th at 1:20pm.
Gokhale, Alhad and Peter Bajcsy, “Automated classification of quilt photographs into crazy and non-crazy,” IS&T/SPIE Electronic Imaging 2011, January 23-27; (Poster presentation).
MacDowell, Marsha. “The Quilt Index: Digging Into and Broadening Content, Current Challenges and Future Opportunities.” Closing Keynote Address, American Quilt Study Group Annual Seminar, October 2010.
Meredith, Michael and Peter Ainsworth, “Answering Medieval Authorship Questions using e-Science”, UK All Hands eScience meeting, 13 Sep 2010 – 16 Sep 2010, Cardfiff Wales.
—, “Digging into Image Data to Answer Authorship-Related Questions”, UK All Hands eScience meeting, 13 Sep 2010 – 16 Sep 2010, Cardiff, Wales.
Rehberger, Dean. “What to do with a Million Images: Rhetoric, Composition and High Performance Computing.” Conference on College Composition and Communication. Atlanta, Georgia, April 6-9, 2011. (Invited Featured Speaker)
—.”What to do with a Million Images: Rhetoric, Composition and High Performance Computing.” 2011 CCCC Virtual Conference. April 27, 2011.
—.”Digging into Data.” Computers and Writing 2011. University of Michigan. Ann Arbor, MI, May 19-22, 2011.
—.”Corporate Authorship and the Classification of Quilts.” Imaging without Boundaries: Exploring the Science, Technology, and Applications of Imaging and Visualization, Beckman Center, University of Illinois, Champaign IL, October 14-15, 2010.
Richardson, Justine. “The Quilt Index International and Digging into Data: Two Material Culture Digital Repository Initiatives Advancing Global Knowledge Production in the Humanities – There’s a Quilt for That.” December 1, 2011, at HASTAC: Humanities, Arts, Science and Technology Advanced Collaboratory, Ann Arbor, Michigan.
—. “Visually Digging into Museum Data.” Poster presentation on Digging into Image Data research project interdisciplinary collaboration with computer scientists and humanities scholars. April 8, 2011, Museums and the Web 2011, Philadelphia, PA.
—. “Supporting Scientific Discoveries to Answer Art Authorship Related Questions Across Diverse Disciplines and Geographically Distributed Resources”: Paper Presentation. Digital Humanities 2011, June 19-22, 2011, Stanford University, Palo Alto, California. Co-authored with: Jennifer Guiliano of the University of Maryland; Peter Bajcsy, Rob Kooper, Luigi Marini, Tenzing Shaw, Anne D. Hedeman, Robert Markley, Michael Simeone, Natalie Hanson, and Simon Appleford of the University of Illinois; Peter Ainsworth and Michael Meredith, University of Sheffield (UK); Dean Rehberger, Justine Richardson, Matthew Geimer, and Steve M. Cohen, Michigan State University.
Shaw, Tenzing and Peter Bajcsy, “Automation of Digital Historical Map Analyses,” IS&T/SPIE Electronic Imaging 2011, January 23-27; (accepted as an oral presentation).
Shaw, Tenzing Michael Simeone, Robert Markley, and Peter Bajcsy, “Quantifying Historical Geographic Knowledge From Digital Maps,” Microsoft Research eScience Workshop, October 11–13 in Berkeley, CA, (accepted as oral presentation August 2010).
Shaw, Tenzing, Natalie Hansen, Anne D. Hedeman, and Peter Bajcsy, “Quantifying Differences between Medieval Artistic Hands Using Statistical Analyses in Multiple Color Spaces,” UK All Hands eScience meeting, 13 Sep 2010 – 16 Sep 2010, Cardiff, Wales, http://www.allhands.org.uk/events/all-hands-meeting-2010
Worrall, Mary. “The Quilt Index: On-Line Tool for Education and Research.” Roundtable, American Quilt Study Group Annual Seminar, October 2010.
Sample groups of images and metadata for each content area were uploaded to Illinois’ Medici collaboration platform for internal research access by DiD-ARQ researchers. Full datasets are accessible by each data set as follows:
The current collection of early 15th-century manuscripts comprises in excess of 6,100 images mainly at 500 DPI, hosted on a federated Storage Resource Broker (SRB) facility between UoS and UIUC using a web-front end collaboratively developed between the two sites (see http://cbers.shef.ac.uk). The images can also be retrieved from the SRB system via an API which provides direct access to the image dataset within a programming environment. See also the Online Froissart project: http://www.hrionline.ac.uk/onlinefroissart/.
The Quilt Index comprises more than 60,000 images of quilts (ranging from 150 to 600 dpi) with associated metadata regarding technical textile features as well as provenance and historical background documentation. All images are available online at the Quilt Index (http://www.quiltindex.org), hosted in the open source KORA repository at Michigan State University. The KORA repository has a web management interface, an API, as well as OAI-PMH capacity.
For DiD-ARQ research, maps of the Great Lakes dating between 1747 and 1797 were selected and digitized from the Map Library at the University of Illinois Urbana-Champaign. High resolution TIFF files produced for the project were indexed and contributed to the library. Researchers may contact the library for further access or research needs. http://www.library.illinois.edu/
Tools and documentation
Tools and documentation for the algorithms created and tested through this project will be available on each participating institution’s project site. The direct code repository URL is: http://did.ncsa.illinois.edu/svn/did/trunk/.