A lossy/lossless compression method for printed typeset bi-level text images based on improved pattern matching
In this paper, we propose a multilayer document compression algorithm. This algorithm first segments a scanned document image into different classes such as text, images and background, then compresses each class using an algorithm specifically designed for that class. Two algorithms are investigated for segmenting documents: a general purpose image segmentation algorithm called the trainable sequential MAP (TSMAP) algorithm, and a ratedistortion optimized segmentation (RDOS) algorithm. Experimental results show that the multilayer compression algorithm can achieve a much lower bit rate than most conventional algorithms such as JPEG at similar subjective distortion levels. We also find that the RDOS method produces more robust segmentations than TSMAP by eliminating misclassifications which can sometimes cause severe artifacts.