Extracting person names from diverse and noisy OCR text

  title={Extracting person names from diverse and noisy OCR text},
  author={Thomas L. Packer and Joshua F. Lutes and Aaron P. Stewart and David W. Embley and Eric K. Ringger and Kevin D. Seppi and Lee S. Jensen},
Named entity recognition applied to scanned and OCRed historical documents can contribute to the discoverability of historical information. However, entity recognition from some historical corpora is much more difficult than from natively digital text because of the marked presence of word errors and absence of page layout information. How difficult can it be and what level of quality can be expected? We apply three typical extraction algorithms to the task of extracting person names from… CONTINUE READING
Highly Cited
This paper has 36 citations. REVIEW CITATIONS

8 Figures & Tables



Citations per Year

Citation Velocity: 6

Averaging 6 citations per year over the last 3 years.

Learn more about how we calculate this metric in our FAQ.