Learn More
Many feature extraction approaches for off-line handwriting recognition (OHR) rely on accurate binarization of gray-level images. However, high-quality binarization of most real-world documents is extremely difficult due to varying characteristics of noises artifacts common in such documents. Unlike most of these features, Gabor features do not require(More)
This article proposes a novel approach on how to rectify the photo image of the bound document. The surface of the document is modeled by a cylindrical surface. By the geometry of camera image formation, the equations using the cue of directrixes to map the points on the surface in the 3-D scene to the points on the image plane are achieved. Baselines of(More)
We describe a rule-line removal algorithm for handwritten document images in this paper. Compared to the existing approaches, our algorithm obtains more scalability to higher-resolution images and thicker rule-lines. Derived from the simple gap-filling methods using line-drawing algorithms, we present a novel approach to regenerating the missing portions of(More)
A model based approach for rectifying the camera image of the bound document has been developed, i.e., the surface of the document is represented by a general cylindrical surface. The principle of using the model to unwrap the image is discussed. Practically, the skeleton of each horizontal text line is extracted to help estimate the parameter of the model,(More)
This paper proposes a statistical approach to degraded handwritten form image preprocessing including binarization and form line removal. The degraded image is modeled by a Markov random field (MRF) where the prior is learnt from a training set of high quality binarized images, and the probabilistic density is learnt on-the-fly from the gray-level histogram(More)
Keyword retrieval in handwritten document images (word spotting) is very challenging given that OCR accuracy is not yet adequate for handwritten scripts, specially with large lexicons. Various proposed approaches build indices on information such as image features or OCR scores and have improved the performance of the traditional approach that builds index(More)
Offline handwriting recognition of free-flowing Arabic text is a challenging task due to the plethora of factors that contribute to the variability in the data. In this paper, we address some of these sources of variability, and present experimental results on a large corpus of handwritten documents. Specific techniques such as the application of(More)
Camera-captured optical character recognition (OCR) is a challenging area because of artifacts introduced during image acquisition with consumer-domain hand-held and Smart phone cameras. Critical information is lost if the user does not get immediate feedback on whether the acquired image meets the quality requirements for OCR. To avoid such information(More)
This paper presents a statistical approach to the preprocessing of degraded handwritten forms including the steps of binarization and form line removal. The degraded image is modeled by a Markov random field (MRF) where the hidden-layer prior probability is learned from a training set of high-quality binarized images and the observation probability density(More)