Dorothea Blostein

Learn More
We describe a robust and efficient system for recognizing typeset and handwritten mathematical notation. From a list of symbols with bounding boxes the system analyzes an expression in three successive passes. The Layout Pass constructs a Baseline Structure Tree (BST) describing the two-dimensional arrangement of input symbols. Reading order and operator(More)
A perspective view of a slanted textured surface shows systematic changes in the density, area, and aspect-ratio of texture elements. These apparent changes in texture element properties can be analyzed to recover information about the physical layout of the scene. However, in practice it is difficult to identify texture elements, especially in images where(More)
Recognition of mathematical notation involves two main components: symbol recognition and symbol-arrangement analysis. Symbol-arrangement analysis is particularly difficult for mathematics, due to the subtle use of space in this notation. We begin with a general discussion of the mathematics-recognition problem. This is followed by a review of existing(More)
Categorization of biomedical articles is a central task for supporting various curation efforts. It can also form the basis for effective biomedical text mining. Automatic text classification in the biomedical domain is thus an active research area. Contests organized by the KDD Cup (2002) and the TREC Genomics track (since 2003) defined several annotation(More)
Document recognition and retrieval technologies complement one another, providing improved access to increasingly large document collections. While recognition and retrieval of textual information is fairly mature, with wide-spread availability of optical character recognition and text-based search engines, recognition and retrieval of graphics such as(More)
Topics are collections of words that co-occur frequently in a text corpus. Topics have been found to be effective tools for describing the major themes spanning a corpus. Using such topics to describe the evolution of a software system’s source code promises to be extremely useful for development tasks such as maintenance and re-engineering. However, no one(More)
Document image classification is an important step in Office Automation, Digital Libraries, and other document image analysis applications. There is great diversity in document image classifiers: they differ in the problems they solve, in the use of training data to construct class models, and in the choice of document features and classification(More)