Emancipating Digital Data – Scanning and Image Analysis of the Lincoln Papers
Focusing on scanned writings of Abraham Lincoln’s communication, I-CHASS in collaboration with NCSA and the Abraham Lincoln Presidential Library and Museum in Springfield, Illinois, is creating a managed distributed data repository of materials related to the sixteenth President. E(d)2 charts an initiative to take completed digital scans and use data analytics and forensic studies to trace Lincoln’s correspondence in time and space. For example, an analysis of the concepts of “liberty,” “emancipation,” or the “preservation of the Union,” will reveal when, where, and with whom Lincoln was writing, or even referring to in his correspondence, providing insights into the development of Lincoln’s commitment to these concepts.
The E(d)2 project poses a formidable challenge in historical research from huge volumes of documents where the digital size of scanned writings can be as large as ~150 megabytes for each image scan, reaching ~37 terabytes for the entire collection. The problems of computational requirements to preprocess and analyze the data on demand, lack of automated algorithms to transcribe the writings, bandwidth requirements to interact with the data and lack of web interfaces to convey the geospatial, temporal and contextual characteristics might seem insurmountable. E(d)2 meets these challenges by blending high performance computing for processing image scans with Web 2.0 techniques for interactive learning as illustrated in our prototype system. The project therefore not only enables knowledge extraction from a large digitized collection of Abraham Lincoln’s writings, it also provides a model that can be applied to other digital collections, such as the Papers of the Founding Fathers.