Learn More
In document analysis, it is common to prove the usefulness of a component by an experimental evaluation. By applying the respective algorithms to a test sample, some effectiveness measures such as recall, precision, and accuracy are computed. The goal of such an evaluation is two-fold: on the one hand it shows that the absolute effectiveness of the(More)
The principles of the model-based document analysis system called Pi ODA (paper interface to office document architecture), which was developed as a prototype for the analysis of single-sided business letters in German, are presented. Initially, Pi ODA extracts a part-of hierarchy of nested layout objects such as text-blocks, lines, and words based on their(More)
Document analysis is responsible for an essential progress in office automation. This paper is part of an overview about the combined research efforts in document analysis at DFKI. Common to all document analysis projects is the global goal of providing a high level electronic representation of documents in terms of iconic, structural, textual, and semantic(More)
In the literature, many feature types are proposed for document classification. However, an extensive and systematic evaluation of the various approaches has not yet been done. In particular, evaluations on OCR documents are very rare. In this paper we investigate seven text representations based on n-grams and single words. We compare their effectiveness(More)
S叩l且Clicinrorm81ion. IれIiler仙1Ie,山ereis且18Ckor8m町18pFO8Ches brdeslgnlng18rgediclioMriesthAtWillimp-OVeChmcte=eCOg一 両Iion.InSeinoel止【14I且k†10Wledgeproce8Si†唱(KP)me・ lhodisshown.ThisKPmeLhodis8pplicdbyToshib&OCR SySlems8ndcomp-1SeS一肌のngO山erknowledgesou-CeSmAnyWOrddictiona-iesbeinghierRrChic811ysl川Clured-e・g・,(More)