Optical character recognition: an illustrated guide to the frontier

  title={Optical character recognition: an illustrated guide to the frontier},
  author={George Nagy and Thomas A. Nartker and Stephen V. Rice},
  booktitle={Electronic Imaging},
We offer a perspective on the performance of current OCR systems by illustrating and explaining actual OCR errors made by three commercial devices. After discussing briefly the character recognition abilities of humans and computers, we present illustrated examples of recognition errors. The top level of our taxonomy of the causes of errors consists of Imaging Defects, Similar Symbols, Punctuation, and Typography. The analysis of a series of 'snippets' from this perspective provides insight… 
Quantifying the noise tolerance of the OCR engine Tesseract using a simulated environment
The noise tolerance of Tesseract, a state-of-the-art OCR engine, is evaluated in relation to how well it handles salt and pepper noise, a type of image degradation, and results show that the noise tolerance decreased for larger font sizes.
Training & Quality Assessment of an Optical Character Recognition Model for Northern Haida
The first optical character recognition (OCR) model for Northern Haida, a nearly extinct First Nations language spoken in the Haida Gwaii archipelago in British Columbia, Canada, is created and an overview of current OCR accuracy analysis tools available is presented.
A survey of modern optical character recognition techniques
This report explores the latest advances in the field of digital document recognition and discusses the major developments in optical character recognition (OCR) and document image enhancement/restoration in application to Latin and non-Latin scripts.
4005-898-01 Independent Study Report Character Segmentation and Classification
Optical Character Recognition (OCR) is the process of translating images of handwritten, typewritten, or printed text into a format understood by machines for the purpose of editing,
A Review of Optical Character Recognition
This paper presents an overview of feature extraction methods for character recognition, and indicates that feature extraction method selection is the only most important factor in achieving high recognition performance in character recognition systems.
Extração Automática de Texto em Imagem/Vídeo
This dissertation started by building an OCR system from scratch, with the objective of recognizing text in image, with an accuracy varying between 97% and 99%, for high definition images, but there is a considerable reduction of performance for lower resolutions.
Benchmarking commercial OCR engines for technical drawings indexing
This methodology allows for a list of domain-dependant problems for OCR engines, classified by importance with respect to the correction cost, and could be used to correctly choose the O CR engine, or to enhance the OCR execution, by focusing on the most important problems.
Optimisation of archival processes involving digitisation of typewritten documents
Investigation of optical character recognition (OCR) technology and its implementation in the context of digitisation of archival materials shows that the resolution is significantly more important than binarisation pre-processing procedure for achieving better OCR results.
A Morphological Image Preprocessing Suite for OCR on Natural Scene Images
This work proposes an image preprocessing suite that, through text detection, auto-rotation, and noise reduction, improves the accuracy of OCR analysis in a camera-based translation system.


Nartker, Optical Character Recognition: An illustrated guide to the frontier
  • Kluwer Academic Publishers,
  • 1999
Optical Character Recognition: An illustrated guide to the frontier
  • Kluwer Academic Publishers,
  • 1999