Adapting Tesseract for Complex Scripts: An Example for Urdu Nastalique

@article{Akram2014AdaptingTF,
  title={Adapting Tesseract for Complex Scripts: An Example for Urdu Nastalique},
  author={Qurat ul Ain Akram and Sarmad Hussain and Aneta Niazi and Umair Anjum and Faheem Irfan},
  journal={2014 11th IAPR International Workshop on Document Analysis Systems},
  year={2014},
  pages={191-195}
}
Tesseract engine supports multilingual text recognition. However, the recognition of cursive scripts using Tesseract is a challenging task. In this paper, Tesseract engine is analyzed and modified for the recognition of Nastalique writing style for Urdu language which is a very complex and cursive writing style of Arabic script. Original Tesseract system has 65.59% and 65.84% accuracies for 14 and 16 font sizes respectively, whereas the modified system, with reduced search space, gives 97.87… CONTINUE READING

Tables, Results, and Topics from this paper.

Key Quantitative Results

  • Original Tesseract system has 65.59% and 65.84% accuracies for 14 and 16 font sizes respectively, whereas the modified system, with reduced search space, gives 97.87% and 97.71% accuracies respectively.
  • The overall accuracy of recognition is 97.87% for 14 font size and 97.71% for 16 font size.
  • The modified system has 97.87 % accuracy for 14 font size and 97.71 % accuracy for 16 font size.

Explore Further: Topics Discussed in This Paper

Citations

Publications citing this paper.
SHOWING 1-10 OF 17 CITATIONS

Impact of Ligature Coverage on Training Practical Urdu OCR Systems

  • 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)
  • 2017
VIEW 4 EXCERPTS
CITES RESULTS & METHODS
HIGHLY INFLUENCED

Ligature-based font size independent OCR for Noori Nastalique writing style

  • 2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR)
  • 2017
VIEW 4 EXCERPTS
CITES BACKGROUND & METHODS

A Holistic Approach for Recognition of Complete Urdu Ligatures Using Hidden Markov Models

  • 2017 International Conference on Frontiers of Information Technology (FIT)
  • 2017
VIEW 7 EXCERPTS
CITES RESULTS, METHODS & BACKGROUND
HIGHLY INFLUENCED

Segmentation-free optical character recognition for printed Urdu text

  • EURASIP J. Image and Video Processing
  • 2017
VIEW 6 EXCERPTS
CITES BACKGROUND & RESULTS
HIGHLY INFLUENCED

Ligature Analysis-based Urdu OCR Framework

  • 2017 International Conference on Frontiers of Information Technology (FIT)
  • 2017
VIEW 1 EXCERPT
CITES METHODS

Urdu ligature recognition techniques-A review

  • 2017 International Conference on Intelligent Communication and Computational Techniques (ICCT)
  • 2017
VIEW 1 EXCERPT
CITES METHODS

References

Publications referenced by this paper.
SHOWING 1-10 OF 21 REFERENCES

CLE , " CLE Urdu HFL 14 Point Size , " CLE . [ Online ]

El-Korashy, F. Shafait
  • Segmentation Based Urdu Nastalique OCR , " in CIARP
  • 2013

Holistic Arabic Whole Word Recognition using HMM and Discrete Cosine Transformation

A. Krayem, N. Sherkat, L. Evett, T. Osman
  • ICDAR, 2013.
  • 2013
VIEW 1 EXCERPT

Search Space Reduction for Holistic Ligature Recognition in Urdu Nastalique Script

  • 2013 12th International Conference on Document Analysis and Recognition
  • 2013
VIEW 1 EXCERPT

Unicode Text Segmentation

M. Davis
  • Unicode, 2013.
  • 2013
VIEW 1 EXCERPT