The effect of OCR errors on stylistic text classification

Abstract

Recently, interest is growing in <i>non-topical</i> text classification tasks such as genre classification, sentiment analysis, and authorship profiling. We study to what extent OCR errors affect stylistic text classification from scanned documents. We find that even a relatively high level of errors in the OCRed documents does not substantially affect stylistic classification accuracy.

DOI: 10.1145/1148170.1148325

Extracted Key Phrases

3 Figures and Tables

Cite this paper

@inproceedings{Stein2006TheEO, title={The effect of OCR errors on stylistic text classification}, author={Sterling Stuart Stein and Shlomo Argamon and Ophir Frieder}, booktitle={SIGIR}, year={2006} }