Digitised historical text: Does it have to be mediOCRe?

Abstract

This paper reports on experiments to improve the Optical Character Recognition (ocr) quality of historical text as a preliminary step in text mining. We analyse the quality of ocred text compared to a gold standard and show how it can be improved by performing two automatic correction steps. We also demonstrate the impact this can have on named entity… (More)

Topics

4 Figures and Tables

Statistics

02040201520162017
Citations per Year

Citation Velocity: 11

Averaging 11 citations per year over the last 3 years.

Learn more about how we calculate this metric in our FAQ.