Comparison of named entity recognition tools for raw OCR text

  title={Comparison of named entity recognition tools for raw OCR text},
  author={Kepa Joseba Rodriquez and Mike Bryant and Tobias Blanke and Magdalena Luszczynska},
This short paper analyses an experiment comparing the efficacy of several Named Entity Recognition (NER) tools at extracting entities directly from the output of an optical character recognition (OCR) workflow. The authors present how they first created a set of test data, consisting of raw and corrected OCR output manually annotated with people, locations, and organizations. They then ran each of the NER tools against both raw and corrected OCR output, comparing the precision, recall, and F1… CONTINUE READING
Highly Cited
This paper has 47 citations. REVIEW CITATIONS

From This Paper

Figures, tables, and topics from this paper.


Publications citing this paper.
Showing 1-10 of 21 extracted citations


Publications referenced by this paper.
Showing 1-5 of 5 references

. An overview of the tesseract OCR engine Measuring mass text digitization quality and usefulness

Ray Smith

Ocrop - odium : open source OCR for small - scale historical archives The OCRopus open source OCR system

Jean Carletta

Similar Papers

Loading similar papers…