Suitability of OCR Engines in Information Extraction Systems : a Comparative Evaluation
@inproceedings{Erlandsson2019SuitabilityOO, title={Suitability of OCR Engines in Information Extraction Systems : a Comparative Evaluation}, author={Zacharias Erlandsson}, year={2019} }
Previous research has compared the performance of OCR (optical character recognition) engines strictly for character recognition purposes. However, comparisons of OCR engines and their suitability ...
Figures and Tables from this paper
figure 1.1 figure 2.1 figure 2.2 figure 2.3 figure 2.4 figure 2.5 figure 2.6 figure 3.1 table 3.1 figure 3.2 table 3.2 table 3.3 figure 3.4 table 3.4 figure 3.5 table 3.5 figure 3.6 table 3.6 figure 4.1 table 4.1 figure 4.2 table 4.2 figure 4.3 table 4.3 figure 4.4 table 4.4 figure 4.5 table 4.5 figure 4.6 table 4.6 figure 4.7 table 4.7 figure 4.8 table 4.8 figure 5.1
References
SHOWING 1-10 OF 29 REFERENCES
An Overview of the Tesseract OCR Engine
- Computer ScienceNinth International Conference on Document Analysis and Recognition (ICDAR 2007)
- 2007
The Tesseract OCR engine, as was the HP Research Prototype in the UNLV Fourth Annual Test of OCR Accuracy, is described in a comprehensive overview. Emphasis is placed on aspects that are novel or at…
A Complete OCR System Development of Tamil Magazine Documents
- Computer Science
- 2003
The ability of artificial neural networks to learn arbitrary input/output mappings from sample data for solving the key problems of segmentation and character recognition is used in an early version of a complete Optical Character Recognition (OCR) system for Tamil magazine documents.
The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web]
- Computer ScienceIEEE Signal Processing Magazine
- 2012
In this issue, “Best of the Web” presents the modified National Institute of Standards and Technology (MNIST) resources, consisting of a collection of handwritten digit images used extensively in…
Handbook of Document Image Processing and Recognition
- Computer ScienceSpringer London
- 2014
The Handbook of Document Image Processing and Recognition is a comprehensive resource on the latest methods and techniques in document image processing and recognition that enables the reader to make an informed decision for their specific problems.
Historical review of OCR research and development
- PhysicsProc. IEEE
- 1992
Both template matching and structure analysis approaches to R&D are considered and it is noted that the two approaches are coming closer and tending to merge.
Systems for Handwritten Gurmukhi Script – A Survey
- Computer Science
- 2011
An overview of the various O.C.R. systems for gurmukhi which are developed for handwritten isolated gurMukhi text is presented.
Scraping the ACM Digital Library
- Computer ScienceSIGF
- 2001
An attempt to analyze PDF documents to automatically reference link the online scholarly literature using the ACM Digital Library was undertaken, with roughly 80% accuracy obtained in the automatic extraction of reference linking information.
Message Understanding Conference- 6: A Brief History
- Computer ScienceCOLING
- 1996
MUC-6 introduced several innovations over prior MUCs, most notably in the range of different tasks for which evaluations were conducted and the motivations for the new format.
OCR as a Service: An Experimental Evaluation of Google Docs OCR, Tesseract, ABBYY FineReader, and Transym
- Computer ScienceISVC
- 2016
The present evaluation is expected to advance OCR research, providing new insights and consideration to the research area, and assist researchers to determine which service is ideal for optical character recognition in an accurate and efficient manner.
OCR Post-Processing Error Correction Algorithm using Google Online Spelling Suggestion
- Computer ScienceArXiv
- 2012
This paper proposes a post-processing context-based error correction algorithm for detecting and correcting OCR non-word and real-word errors, based on Google’s online spelling suggestion which harnesses an internal database containing a huge collection of terms and word sequences gathered from all over the web.