Learn More
—Recurrent neural networks (RNN) have been successfully applied for recognition of cursive handwritten documents , both in English and Arabic scripts. Ability of RNNs to model context in sequence data like speech and text makes them a suitable candidate to develop OCR systems for printed Nabataean scripts (including Nastaleeq for which no OCR system is(More)
—Optical character recognition (OCR) of machine printed Latin script documents is ubiquitously claimed as a solved problem. However, error free OCR of degraded or noisy text is still challenging for modern OCR systems. Most recent approaches perform segmentation based character recognition. This is tricky because segmentation of degraded text is itself(More)
The recognition of Arabic script and its derivatives such as Urdu, Persian, Pashto etc. is a difficult task due to complexity of this script. Particularly, Urdu text recognition is more difficult due to its Nasta'liq writing style. Nasta'liq writing style inherits complex calligraphic nature, which presents major issues to recognition of Urdu text owing to(More)
—Segmentation and recognition of screen rendered text is a challenging task due to its low resolution (72 or 96 ppi) and use of anti-aliased rendering. This paper evaluates Hidden Markov Model (HMM) techniques for OCR of low resolution text–both on screen rendered isolated characters and screen rendered text-lines–and compares it with the performance of(More)
OCR of multi-font Arabic text is difficult due to large variations in character shapes from one font to another. It becomes even more challenging if the text is rendered at very low resolution. This paper describes a multi-font, low resolution, and open vocabulary OCR system based on a multidimensional recurrent neural network architecture. For this work,(More)
— Orientation detection is an important preprocess-ing step for accurate recognition of text from document images. Many existing orientation detection techniques are based on the fact that in Roman script text ascenders occur more likely than descenders, but this approach is not applicable to document of other scripts like Urdu, Arabic, etc. In this paper,(More)
—Cursive handwriting recognition is still a hot topic of research, especially for non-Latin scripts. One of the techniques which yields best recognition results is based on recurrent neural networks: with neurons modeled by long short-term memory (LSTM) cells, and alignment of label sequence to output sequence performed by a connectionist temporal(More)
Document script recognition is one of the important preprocessing steps in a multilingual optical character recognition (MOCR) system. A MOCR system requires prior knowledge of script to accurately recognize multilingual text in a single document. In multilingual documents two scripts can be mixed together within a single text line. Many existing script(More)
A large amount of real-world data is required to train and benchmark any character recognition algorithm. Developing a page-level ground-truth database for this purpose is overwhelmingly laborious, as it involves a lot of manual efforts to produce a reasonable database that covers all possible words of a language. Moreover, generating such a database for(More)