Learn More
MOTIVATION Predicting the secondary structure of a protein (alpha-helix, beta-sheet, coil) is an important step towards elucidating its three-dimensional structure, as well as its function. Presently, the best predictors are based on machine learning approaches, in particular neural network architectures with a fixed, and relatively short, input window of(More)
—In this paper, we describe a flexible form-reader system capable of extracting textual information from accounting documents, like invoices and bills of service companies. In this kind of document, the extraction of some information fields cannot take place without having detected the corresponding instruction fields, which are only constrained to range in(More)
We describe an approach for table location in document images. The documents are described by means of a hierarchical representation that is based on the MXY tree. The presence of a table is hypothesized by searching parallel lines in the MXY tree of the page. This hypothesis is af-terwards verified by locating perpendicular lines or white spaces in the(More)
Nowadays, Digital Libraries have become a widely used service to store and share both digital born documents and digital versions of works stored by traditional libraries. Document images are intrinsically non-structured and the structure and semantic of the digitized documents is in most part lost during the conversion. Several techniques related to the(More)
Text categorization is typically formulated as a concept learning prob lem where each instance is a single isolated document. In this paper we are interested in a more general formulation where documents are organized as page sequences, as naturally occurring in digital libraries of scanned books and magazines. We describe a method for classifying pages of(More)
In the traditional setting, text categorization is formulated as a concept learning problem where each instance is a single isolated document. However, this perspective is not appropriate in the case of many digital libraries that offer as contents scanned and optically read books or magazines. In this paper, we propose a more general formulation of text(More)