Impact of Crowdsourcing OCR Improvements on Retrievability Bias

  title={Impact of Crowdsourcing OCR Improvements on Retrievability Bias},
  author={Myriam C. Traub and Thaer Samar and Jacco van Ossenbruggen and Lynda Hardman},
Digitized document collections often suffer from OCR errors that may impact a document's readability and retrievability. We studied the effects of correcting OCR errors on the retrievability of documents in a historic newspaper corpus of a digital library. We computed retrievability scores for the uncorrected documents using queries from the library's search log, and found that the document OCR character error rate and retrievability score are strongly correlated. We computed retrievability… CONTINUE READING
Recent Discussions
This paper has been referenced on Twitter 15 times over the past 90 days. VIEW TWEETS
0 Citations
1 References
Similar Papers


Publications referenced by this paper.

Similar Papers

Loading similar papers…