Prep it, Scan it, and Describe it: Production Tools and Techniques for Description and Rapid Digitization of Manuscripts and Texts
Paul Deschner and Stephen Chapman of the Harvard Law School Library Lab will report on the tools and techniques that project teams have been using to describe and rapidly and cheaply digitize collections. Through digitization, we aim to deliver research collections to the classroom—and freely via the web—to faculty, students, and scholars not only interested in accessing historic materials, but also enriching them by using tools for tagging.
We will highlight the most recent phase of digitizing archival documents from the Nuremberg Trials of military, political and other leaders of Nazi Germany (the “Nuremberg Trials Project”), where staff prepped 4,285 folders (in 360 boxes) and scanned 413,647 document pages in 15 weeks. As of CALI 2015, we will also be able to share findings from a three-month experiment testing tools for document discovery and tagging.
Following this session, CALI attendees will have the means to evaluate technologies, workflows, and services that enable project teams to:
- Prepare materials for digitization on a high-speed document scanner (and route selected pages to conservation and “oversize” scanning workflows as needed);
- Complete archival processing of original documents (label new folders);
- Produce (color) digital masters suitable for viewing, printing, and OCR processing, without damaging original pages;
- Segregate child documents from parent folders (or volumes);
- Distribute document metadata creation, via a networked, web-accessible platform, to many specialists, so that assembled, tagged and fully described documents may be ingested into a document discovery and delivery service such as that used for the Nuremberg Trials Project;
- Provide item-level tracking of the status of digitization for each source item.
With live demos, screen shots, and at least one video of the high-speed scanner in operation, we will present these systems:
- confluence wiki pages for tracking and status;
- simple web form to generate barcoded cover sheets for each scan job;
- a leased high-speed scanner, with resident image processing systems to output lossless JP2 and Group 4 compressed TIFF images;
- scanning QA software;
- HLS Library lab-developed, open-source “virtual foldering” app to segregate documents from parent-digitized folders;
- HLS Library lab-developed, open-source SuiteSpot tagging tool to record document level metadata (tags);
- metadata creation workflow and scripts;
- web-hosted discovery interface for the Nuremberg Trials Project (http://nuremberg.law.harvard.edu);
- the database back end for the Nuremberg Trials Digital Document Collection
Schedule info
- Login to post comments