Limits on the Application of Frequency-Based Language Models to OCR

@article{Smith2011LimitsOT,
  title={Limits on the Application of Frequency-Based Language Models to OCR},
  author={Ray Smith},
  journal={2011 International Conference on Document Analysis and Recognition},
  year={2011},
  pages={538-542}
}
Although large language models are used in speech recognition and machine translation applications, OCR systems are "far behind" in their use of language models. The reason for this is not the laggardness of the OCR community, but the fact that, at high accuracies, a frequency-based language model can do more damage than good, unless carefully applied. This paper presents an analysis of this discrepancy with the help of the Google Books n-gram Corpus, and concludes that noisy-channel models… CONTINUE READING
Highly Cited
This paper has 26 citations. REVIEW CITATIONS