Iterative Analysis of Pages in Document Collections for Efficient User Interaction

Abstract

The analysis of sets of degraded documents, like historical ones, is error-prone and requires human help to achieve acceptable quality levels. However, human interaction raises 3 main issues when processing important amounts of pages: none of the user or the system should wait for work; information provided by a human operator should not be restricted to local isolated corrections, but rather produce durable changes in the system; the ability to interact with a human operator should not increase the complexity of document models nor duplicate them between analysis and human interaction processes. To solve those issues, we propose an iterative approach, based on a special mechanism called visual memory, to reintegrate external information during page analysis. So as to demonstrate the interest for existing systems, we explain how we adapted a (rule-based) page analysis tool to enable, in this iterative approach, a delayed interaction with a human operator based on an adaptation of error recovery principles for compilers and the well-known exception handling mechanism. We validated our iterative approach on sales registers from the 18th century. Keywords-document analysis; degraded documents; document sets; iterative analysis; user interaction;

DOI: 10.1109/ICDAR.2011.107

Extracted Key Phrases

2 Figures and Tables

Cite this paper

@inproceedings{Chazalon2011IterativeAO, title={Iterative Analysis of Pages in Document Collections for Efficient User Interaction}, author={Joseph Chazalon and Bertrand Co{\"{u}asnon and Aur{\'e}lie Lemaitre}, booktitle={ICDAR}, year={2011} }