[Note: Additional details about this project are contained in the main body of this report (PDF).]
The final of the eight inaugural Challenge initiatives is exceptional in that it is not only multi-disciplinary in terms of its applications but also in terms of data types integrated into a single project. Rather than being centered on one, or a small handful, of discrete data corpora, or on one particular type of research methodology, Railroads and the Making of Modern America is focused squarely on its subject-railroads-but also informed by a distinct theoretical framework for historiography. This framework, not unlike the position taken by the DMCI scholars, incorporates historical evidence into an integrated knowledge system that affords an examination of that evidence both from the broad, macroscopic and the minute, microscopic levels, and at every level in between. If realized, such an integrated system would hold the potential to make “invisible history” visible; in other words, help show correlations between continental trends and local acts that might not otherwise be noticeable in an archive, on a map, or in a narrative alone. In a way, the ambition for comprehensive coverage of the topic of American railroads is comparable to the ambition of the Dynamic Variorum Editions project to build an infrastructure that could encompass all primary and secondary materials related to Classical Studies, except in this case the goal is to create an infrastructure for using “big data” to do “big history” within a “continental-scale” geographic landscape, revealing relationships across both time and space. The Railroads project shares with Digging Into the Enlightenment the aim of constructing interactive data visualizations from evidence that is often incomplete.
The “big data” the two partners in the Railroads project seek to harness to the North American map include scanned books, serials, city directories, census documents, maps and geographic data, and, ultimately, all forms of railroad-related archives and ephemera, such as timetables, payroll documents, annual reports, etc. Naturally, not all information of these types is presently available in digital form. Libraries, archives, and online publishers are nevertheless making enormous strides in making these kinds of materials available. For this reason, efforts like the Railroads project, that seek to bring them together in new ways, are essential for informing libraries’ ongoing work. In the project white paper, the Railroads team cites numerous barriers to access posed by current digitization practice; these include limited access to many online sources, the difficulty of digitally assembling complete runs of rare serials from multiple repositories, to the neglect of fold-out maps in mass digitization projects such as Google Books. Despite huge investments and increasing availability, access to digitized newspaper collections is still particularly fraught for researchers who wish to repurpose that content in computationally intensive projects, since they are “designed with the needs of individual researchers in mind, rather than the requirements of automated and exploratory data analysis tools.” Licensing agreements or rights restrictions imposed by the suppliers of digitized content can often be too limiting to permit this kind of work.
Added to these barriers on access for the Railroads team are the thorny issues of data quality and consistency. To start, the data types considered for this project are inherently diverse, inconsistent, and variable as any advanced archival research practitioner may find in a career, but since digitization procedures refashion them as images which are then indexed through optical character recognition (OCR) in order to make them machine-readable numerous errors and omissions arise that hinder results. When research depends upon identifying and linking together all mentions of particular names, places, dates, and other numbers across varied data types, OCR frequently falls short. While automated methods of indexing can do a great deal, the limitations of the effectiveness of these technologies are such that a certain amount of hand-correction of these errors is unavoidable in order to insure precise and accurate visual analysis. Gaining access to their data was “simply the first stage” for the Railroads collaborators. Just as in the case of errors arising in the OCR of Greek text for the DVE team or the incomplete transcripts or recordings of the British National Corpus for Mining a Year of Speech, problems with data required large investments in hand cleaning, standardization, and coding. As in the other projects for which this was an issue, student labor made these tasks possible.
In order to help themselves manage the risks posed by barriers to access and variable data quality, the project partners have sub-divided their project into five work packages they call “Apps,” each of which explores a discrete railroad-related topic from the standpoint of a particular set of evidence. Built on a common platform called the Aurora Engine, each App contains an interactive visualization, raw data files and descriptive metadata for those files, a summary of the significance of the visualization to the study of the subject domain, documentation for those who wish to repurpose the visualization or data in other work, and (for project participants) a curatorial interface for editing content. While the work to complete the Apps had not yet concluded by the end of the project’s term, the white paper describes their envisioned functionality. For example, their “Network Connectivity” App draws upon historical and archival evidence to display the geographic expansion of the American railroad on an interactive map that allows visitors to explore at five-year intervals the growth and, in some cases, loss of railroad coverage across the United States throughout the latter half of the nineteenth century. “The Civil War and Mobility” App marks references to geographic locations that appear near any given keyword used in editions of the Richmond Daily Dispatch from 1860-1865.
Because of the aforementioned difficulties attendant to integrating heterogeneous data, the project serves an excellent test case for confronting challenges to be faced by large numbers of humanists as they enter the realm of computationally intensive research. As they continue their work assembling, correcting, and integrating their data into their five Apps, the team continues to think critically about that data, its limitations, and the manners in which it is contingent upon both its original cultural context and our own interpretations and/or mis-interpretations of its content. The value of deep engagement with evidence is familiar to researchers across the disciplines, but in a product-oriented research culture this is less frequently captured and touted as a beneficial outcome of the research process. Unless one recognizes the full range of outcomes of large-scale collaborations that incorporate data integration, including perhaps the most important of those outcomes-standardized, annotated data made openly accessible to others-it is impossible to understand the full import of this kind of long-term research initiative. But it is projects such as Railroads and the Making of Modern America that, by requiring such attention to detail, create opportunities for reflections on historical practice, the nature of evidence, and the limitations of computer technology for revealing new truths. Advocating a “critical middle ground” that accepts that reality is too complex to “be compressed into a very small number of categories,” yet not so “infinitely complex that any attempt to standardize data or search for regularities is fruitless”:
[T]he growing volume of digital source materials should not result in the increasing suspension of our critical faculties when terabytes of data wash over us. Instead, more data requires a much greater engagement of these critical faculties, since there is more scope for detailed investigation of within and between group variability, more opportunity for comparison and integration of different types of research resources, and more need for both simple and complex analyses, data mining and data visualization. [Healey and Thomas white paper, 42-43.]
- William G. Thomas, III (University of Nebraska-Lincoln, US) served as Principal Investigator of the NEH-funded portion of the project and contributed as a data and subject expert in American history.
- Richard Healey (University of Portsmouth, UK) served as Principal Investigator of the JISC-funded portion of the project and also contributed as a data and subject expert in American railroad history, geography, and geographic information systems (GIS).
- Ian Cottingham (University of Nebraska-Lincoln, US) is Chief Software Architect in the Department of Computer Science and Engineering at UNL and contributed technical and analytical expertise, leading the team designing and building the Aurora Engine for the exploration of geographic data.
- Leslie Working (University of Nebraska-Lincoln, US) is a Graduate Instructor in History based at the Center for Digital Research in the Humanities at the University of Nebraska-Lincoln and contributed project management expertise for the NEH-funded portion of the project, helping to supervise a team of students doing data checking and correction for the project.
- Michael Johns (University of Portsmouth, UK) is a transportation GIS specialist who had responsibility for development and enhancement of GIS and database resources relating to the Eastern Trunk Line Railroads for use in web-based visualisations
- Nathan B. Sanderson (University of Nebraska-Lincoln, US) is a Ph.D. candidate in American History at the University of Nebraska-Lincoln who contributed subject and project management expertise to the Railroads and the Making of Modern America Project based at the University of Nebraska.
Other participants and advisors
- Anne Bretagnolle (Paris One University, France)
- Ian Gregory (University of Lancaster, UK)
- Anne Kelly Knowles (Middlebury College, US)
- John Lutz (University of Victoria, Canada)
- Sherry Olson (McGill University, Canada)
- Ashok Samal (University of Nebraska-Lincoln, US)
- Martin Schaefer (University of Portsmouth, UK)
- Stephen Scott (University of Nebraska-Lincoln, US)
- Emma White (University of Portsmouth)
- Richard White (Stanford University, US)
- Eli Katz (Stanford University, US)
- Danny Towns (Stanford University, US)
- Kathy Harris (Stanford University, US)
Main Project Website: follow links to “Aurora Project Apps”
Thomas, William G. The Iron Way: Railroads, the Civil War, and the Making of Modern America (Yale University Press, 2011).
Healey, Richard G. (2012) “Railroads and Immigration in the Northeast United States 1850-1900” Geography Compass, 6(8), 455-476.
Thomas, William G. “The Civil War, Railroads, and the Making of Modern America.” Miami University Hamilton, Hamilton, Ohio, October 2011. [Introduction | Part One | Part Two | Part Three | Q&A Session]
Papers and Presentations
Thomas, William G and R. G. Healey (2010) “Railroad Workers and Worker Mobility in the Great Plains” (Paper given at the Western History Association Conference, Lake Tahoe, October 2010)
Thomas, William G. and Doug Downey, (2010) “Digging into Railroads,” (paper given at Chicago Colloquium in Digital Humanities, Northwestern University, November 2010)
Healey, Richard G., W. G. Thomas, M. Johns, I. Cottingham, E. White and L. Working (2010) “Digging into Railroad Data : a GIS and Visualisation-Based Approach” (Paper given at the SSHA Annual Conference, Chicago, November 2010).
Healey, Richard G.(2010) “Space: the Final Railroad Frontier?” Commentator. Panel on Richard White’s “Constructing Railroad Space.” SSHA Annual Conference, Chicago, November 2010.
Thomas, William G. “Digital Analysis of Texts: The Mobility of African Americans After Emancipation,” Organization of American Historians, Houston, March 2011.
Healey, Richard G., M. Johns, M. Schaefer, W. G. Thomas, I. Cottingham and L. Working (2011) “Railroads, Visualization and the Web: A Progress Report on the ‘Digging into Data Challenge’ Project” (Paper given at the GISRUK Annual Conference, Portsmouth, UK, April 2011)
Thomas, William G., R. G. Healey I. Cottingham, M. Johns, L. Working and M. Schaefer (2011) “Railroads and the Making of Modern America: Tools for Spatio-Temporal Analysis and Visualization.” (Paper given at the JISC/NEH Digging into Data Challenge Conference, Washington D.C. June 2011)
Healey, Richard G., and M. Johns (2011) “Development of an Historical GIS of Railroads in the North-East USA 1826-1900. Phase II” (Paper given at the Workshop on Railroads in Historical Context: Construction, Costs and Consequences Foz Tua, Alto Douro, Portugal, October 2011 – now published in conference proceedings).
Thomas, William G. (2012) “Digital History: The State of the Field.” Chair, American Historical Association, Chicago, January 2012.
Thomas, William G. and Leslie Working (2012) “Railroads and the Making of Modern America: Tools for Spatio-Temporal Visualization” American Historical Association Conference, January 2012.
Thomas, William G. and Leslie Working (2012) “African American Mobility After Emancipation, 1865-1867,” (paper given at the Society of Civil War Historians, Lexington, June 2012)
Data related to the larger Railroads and the Making of Modern America initiative is available on the Project Website.
Exploratory tools related to this project are available through The Aurora Project website. These include :