o MULTISTRATEGY LEARNING FOR DOCUMENT RECOGNITION

Abstract

In this paper, a methodologyfor document classification and understanding is proposed. It is based on a multistrategy approach to learning from examples. By document classification, we mean the process ofidentification of the particular class to which a document belongs. Document understanding is defined as the process ofdetecting the logical structure ofa document. The multistrategy approach for document classification and understanding has been implemented in a system called PLRS, which embeds two empirical learning systems: RES and lNDUBIIH. Given a set ofdocuments whose layout structure has already been detected and such that the membership class has been defined by the user, RES generates the knowledge base ofan expert system devoted to the classification ofa document. The language used to describe both the layout of the training documents and the learned rules is a first-order language. The learning methodology adopted for the problem oflearning classification rules integrates both a parametric and a conceptual learning method. As to the problem ofdocumelll understanding,lNDUBIIH can be used to generate the recognition rules, provided that the user is able to supply examples of the logical structure. RES and INDUBIIH are implemented in C language. PLRS is a module oflBlsys, a sofrware environment for office automation distributed by Olivetti.

18 Figures and Tables

Cite this paper

@inproceedings{Semeraro1994oML, title={o MULTISTRATEGY LEARNING FOR DOCUMENT RECOGNITION}, author={Giovanni Semeraro}, year={1994} }