Understanding Computational Requirements of Preservation and Reconstruction

The size, complexity and heterogeneity of geospatial collections pose formidable challenges for the National Archives and Records Administration in terms of high performance data storage, access, integration, analysis and visualization. The ultimate goal of this project is to allow researchers and archivists interested in preserving data of historical significance to know at what information granularity (the degree of detail or precision contained in data) that data should be gathered in order to reconstruct the event at a later date. The ideal would be to preserve the information exactly, but CPU and storage costs might be prohibitively expensive. Our approach has been to design a simulation framework that makes use of advanced technologies, high performance computing, and novel computer architectures to provide a platform that allows archivists and other researchers to understand the costs and trade-offs associated with the long-term preservation of electronic records. Our prototype simulation system, Image Provenance To Learn (IP2Learn) is designed for a class of decisions based on image inspection and has successfully been used to run simulations that improve our understanding of future archival needs.