Learn More
We report on the creation of a database composed of images of Arabic Printed words. The purpose of this database is the large-scale benchmarking of openvocabulary, multi-font, multi-size and multi-style text recognition systems in Arabic. The challenges that are addressed by the database are in the variability of the sizes, fonts and style used to generate(More)
In this paper, we propose a new font and size identification method for ultra-low resolution Arabic word images using a stochastic approach. The literature has proved the difficulty for Arabic text recognition systems to treat multi-font and multi-size word images. This is due to the variability induced by some font family, in addition to the inherent(More)
This paper describes the Arabic Recognition Competition: Multi-font Multi-size Digitally Represented Text held in the context of the 11 International Conference on Document Analysis and Recognition (ICDAR2011), during September 18-21, 2011, Beijing, China. This first competition used the freely available Arabic Printed Text Image (APTI) database. Several(More)
This paper describes the Arabic Recognition Competition: Multi-font Multi-size Digitally Represented Text held in the context of the 12 International Conference on Document Analysis and Recognition (ICDAR’2013), during August 25-28, 2013, Washington DC, United States of America. This competition has used the freely available Arabic Printed Text Image (APTI)(More)
The discrimination between languages is one of the first steps in the problem of automatic documents text recognition. In many documents, such as bank checks and application forms, printed and handwritten texts are mixed. In this paper, an automatic identification system of Arabic and French words in both handwritten and printed script based on Gaussian(More)
We analyze in this paper the impact of sub-models choice for automatic Arabic printed text recognition based on Hidden Markov Models (HMM). In our approach, sub-models correspond to characters shapes assembled to compose words models. One of the peculiarities of Arabic writing is to present various character shapes according to their position in the word.(More)
Standard databases play essential roles for evaluating and comparing results obtained by different groups of researchers. In this paper, an Arabic Handwritten Text Images Database written by Multiple Writers (AHTID/MW) is introduced. This database can be used for research in the recognition of Arabic handwritten text with open vocabulary, word segmentation(More)
We present in this paper a new approach for Arabic font recognition. Our proposal is to use a fixedlength sliding window for the feature extraction and to model feature distributions with Gaussian Mixture Models (GMMs). This approach presents a double advantage. First, we do not need to perform a priori segmentation into characters, which is a difficult(More)
We propose an approach to stemming Arabic words similar to the approach of Khoja, but with two dictionaries, one of roots and another of radicals. Our approach has the advantage of reducing the words that are inspired by their radicals to their radical and words which are inspired by their roots to their roots with great reliability and consistency and(More)