An optical character recognition approach to qualifying thresholding algorithms

@inproceedings{Sturgill2008AnOC,
  title={An optical character recognition approach to qualifying thresholding algorithms},
  author={Margaret Sturgill and Steven J. Simske},
  booktitle={ACM Symposium on Document Engineering},
  year={2008}
}
Pre-processing for raster image based document segmentation begins with image thresholding, which is a binarization process separating foreground from background. In this paper, we compare an existing (Otsu), modified existing (Kittler-Illingworth) and simple peak-based thresholding approach on a set of 982 documents for which existing ground truth (full text) is available. We use the output of an open source OCR engine which incorporates an adaptive/dynamic thresholder that can be bypassed by… CONTINUE READING