Learn More
A model based approach for rectifying the camera image of the bound document has been developed, i.e., the surface of the document is represented by a general cylindrical surface. The principle of using the model to unwrap the image is discussed. Practically, the skeleton of each horizontal text line is extracted to help estimate the parameter of the model,(More)
Offline handwriting recognition of free-flowing Arabic text is a challenging task due to the plethora of factors that contribute to the variability in the data. In this paper, we address some of these sources of variability, and present experimental results on a large corpus of handwritten documents. Specific techniques such as the application of(More)
A vector model based information retrieval of handwritten medical forms is presented in this paper. In order to improve the IR performance on the erroneous output of handwriting recognition (HR) systems, a variation of the vector model is made to estimate the number of occurrences of terms from word segmentation and recognition probabilities. IR Tests show(More)
This article proposes a novel approach on how to rectify the photo image of the bound document. The surface of the document is modeled by a cylindrical surface. By the geometry of camera image formation, the equations using the cue of directrixes to map the points on the surface in the 3-D scene to the points on the image plane are achieved. Baselines of(More)
Keyword retrieval in handwritten document images (word spotting) is very challenging given that OCR accuracy is not yet adequate for handwritten scripts, specially with large lexicons. Various proposed approaches build indices on information such as image features or OCR scores and have improved the performance of the traditional approach that builds index(More)
We describe a rule-line removal algorithm for handwritten document images in this paper. Compared to the existing approaches, our algorithm obtains more scalability to higher-resolution images and thicker rule-lines. Derived from the simple gap-filling methods using line-drawing algorithms, we present a novel approach to regenerating the missing portions of(More)
Despite several decades of research in document analysis, recognition of unconstrained handwritten documents is still considered a challenging task. Previous research in this area has shown that word recognizers produce reasonably clean output when used with a restricted lexicon. But in absence of such a restricted lexicon, the output of an unconstrained(More)
In this paper, we present a novel method for extracting handwritten and printed text zones from noisy document images with mixed content. We use Triple-Adjacent-Segment (TAS) based features which encode local shape characteristics of text in a consistent manner. We first construct two codebooks of the shape features extracted from a set of handwritten and(More)
This paper presents a statistical approach to the preprocessing of degraded handwritten forms including the steps of binarization and form line removal. The degraded image is modeled by a Markov Random Field (MRF) where the hidden-layer prior probability is learned from a training set of high-quality binarized images and the observation probability density(More)