• Corpus ID: 199002408

Suitability of OCR Engines in Information Extraction Systems : a Comparative Evaluation

  title={Suitability of OCR Engines in Information Extraction Systems : a Comparative Evaluation},
  author={Zacharias Erlandsson},
Previous research has compared the performance of OCR (optical character recognition) engines strictly for character recognition purposes. However, comparisons of OCR engines and their suitability ... 


An Overview of the Tesseract OCR Engine
  • R. Smith
  • Computer Science
    Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)
  • 2007
The Tesseract OCR engine, as was the HP Research Prototype in the UNLV Fourth Annual Test of OCR Accuracy, is described in a comprehensive overview. Emphasis is placed on aspects that are novel or at
A Complete OCR System Development of Tamil Magazine Documents
The ability of artificial neural networks to learn arbitrary input/output mappings from sample data for solving the key problems of segmentation and character recognition is used in an early version of a complete Optical Character Recognition (OCR) system for Tamil magazine documents.
The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web]
  • L. Deng
  • Computer Science
    IEEE Signal Processing Magazine
  • 2012
In this issue, “Best of the Web” presents the modified National Institute of Standards and Technology (MNIST) resources, consisting of a collection of handwritten digit images used extensively in
Handbook of Document Image Processing and Recognition
The Handbook of Document Image Processing and Recognition is a comprehensive resource on the latest methods and techniques in document image processing and recognition that enables the reader to make an informed decision for their specific problems.
Historical review of OCR research and development
Both template matching and structure analysis approaches to R&D are considered and it is noted that the two approaches are coming closer and tending to merge.
Systems for Handwritten Gurmukhi Script – A Survey
An overview of the various O.C.R. systems for gurmukhi which are developed for handwritten isolated gurMukhi text is presented.
Scraping the ACM Digital Library
An attempt to analyze PDF documents to automatically reference link the online scholarly literature using the ACM Digital Library was undertaken, with roughly 80% accuracy obtained in the automatic extraction of reference linking information.
Message Understanding Conference- 6: A Brief History
MUC-6 introduced several innovations over prior MUCs, most notably in the range of different tasks for which evaluations were conducted and the motivations for the new format.
OCR as a Service: An Experimental Evaluation of Google Docs OCR, Tesseract, ABBYY FineReader, and Transym
The present evaluation is expected to advance OCR research, providing new insights and consideration to the research area, and assist researchers to determine which service is ideal for optical character recognition in an accurate and efficient manner.
OCR Post-Processing Error Correction Algorithm using Google Online Spelling Suggestion
This paper proposes a post-processing context-based error correction algorithm for detecting and correcting OCR non-word and real-word errors, based on Google’s online spelling suggestion which harnesses an internal database containing a huge collection of terms and word sequences gathered from all over the web.