Thomas L. Packer

Learn More
Named entity recognition applied to scanned and OCRed historical documents can contribute to the discoverability of historical information. However, entity recognition from some historical corpora is much more difficult than from natively digital text because of the marked presence of word errors and absence of page layout information. How difficult can it(More)
Building a database of facts extracted from historical documents to enable database-like query and search would reduce the tedium of gleaning facts of interest from historical documents. We propose a solution in which historical documents themselves constitute the stored database. In our solution, we use information-extraction techniques to produce a(More)
A method of automatically extracting facts from lists in OCRed documents and inserting them into an ontology would contribute to making a variety of historical knowledge machine searchable, queryable, and linkable. To work well, such a process must be adaptable to variations in list format, tolerant of OCR errors, and careful in its selection of human(More)
Here we describe a mathematical and statistical signal processing strategy termed event resolution imaging (ERI). Our principal objective was to determine if the acute intoxicating effects of ethanol on spontaneous EEG activity could be discriminated from those of other sedative/hypnotics. We employed ERI to combine and integrate standard analysis methods(More)
Optical character recognition (OCR) produces transcriptions of document images. These transcriptions often contain incorrectly recognized characters which we must avoid or correct downstream. An ability to both identify OCR errors and extract information from OCR output would allow us to extract and index only correct information and to post-process(More)
A process for accurately and automatically extracting asserted facts from lists in OCRed documents and inserting them into an ontology would contribute to making a variety of historical documents machine search-able, queryable, and linkable. To work well, such a process should be adaptable to variations in document and list format, tolerant of OCR errors,(More)
To work well, machine-learning-based approaches to information extraction and ontology population often require a large number of manually selected and annotated examples. In this paper, we propose ListReader which provides a way to train the structure and parameters of a Hidden Markov Model (HMM) without requiring any labeled training data. The induced HMM(More)
Machine learning based approaches to information extraction and ontology population often require a large number of manually selected and annotated examples in order to learn a mapping from facts asserted in text to structured facts asserted in an ontology. In this paper, we propose ListReader which provides a way to train the structure and parameters of a(More)
—A flexible, accurate, and cost-effective method of automatically extracting facts from lists in OCRed documents and inserting them into an ontology would help make those facts machine searchable, queryable, and linkable and expose their rich ontological interrelationships. To work well, such a process must be adaptable to variations in list format,(More)