The effect of OCR errors on stylistic text classification


Recently, interest is growing in <i>non-topical</i> text classification tasks such as genre classification, sentiment analysis, and authorship profiling. We study to what extent OCR errors affect stylistic text classification from scanned documents. We find that even a relatively high level of errors in the OCRed documents does not substantially affect stylistic classification accuracy.

DOI: 10.1145/1148170.1148325

