An Overview of Document Image Analysis Systems
- Engineer Andrei TIGORA
A configurable archive document image analysis system for digital library construction has been designed using rapid prototyping and top-down iterative development methods. This approach has been found to be essential in order to capture the curators’ expertise about existing card archive structures, content and databases. The design currently achieves about 93% correct segmentation of the required archive card fields overall, with 81.3% of all archive cards in a testset of 2000 images having all fields correctly segmented and labelled. Analysis of errors in the testset indicates that heavily-annotated cards and non-standard card formats comprise 5-10% of the overall archive, and a significant proportion of these are unlikely to be resolvable without curatorial intervention.