Automated OCR Ground Truth Generation

  title={Automated OCR Ground Truth Generation},
  author={Joost van Beusekom and Faisal Shafait and Thomas M. Breuel},
  journal={2008 The Eighth IAPR International Workshop on Document Analysis Systems},
Most optical character recognition (OCR) systems need to be trained and tested on the symbols that are to be recognized. Therefore, ground truth data is needed. This data consists of character images together with their ASCII code. Among the approaches for generating ground truth of real world data, one promising technique is to use electronic version of the scanned documents. Using an alignment method, the character bounding boxes extracted from the electronic document are matched to the… CONTINUE READING
Highly Cited
This paper has 21 citations. REVIEW CITATIONS

From This Paper

Figures, tables, and topics from this paper.


Publications citing this paper.
Showing 1-10 of 15 extracted citations

Automatic character labeling for camera captured document images

2016 IEEE International Conference on Image Processing (ICIP) • 2016
View 2 Excerpts

Semi-automated OCR database generation for Nabataean scripts

Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012) • 2012
View 1 Excerpt

Distortion Measurement for Automatic Document Verification

2011 International Conference on Document Analysis and Recognition • 2011
View 1 Excerpt


Publications referenced by this paper.
Showing 1-10 of 13 references

Attributed point matching for automatic groundtruth generation

International Journal on Document Analysis and Recognition • 2002
View 4 Excerpts
Highly Influenced

Document image ground truth generation from electronic text

Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004. • 2004
View 1 Excerpt

Implementation techniques for geometric branch-and-bound matching methods

Computer Vision and Image Understanding • 2003
View 1 Excerpt

Online handwriting recognition: the NPen++ recognizer

International Journal on Document Analysis and Recognition • 2001
View 1 Excerpt

Haralick . An automatic closed - loop methodology for generating character groundtruth for scanned documents

T. Kanungo
IEEE Trans . Pattern Anal . Mach . Intell . • 1999

Similar Papers

Loading similar papers…