An Empirical Study of Effectiveness of Post-Processing in Indic Scripts

@article{Vinitha2017AnES,
  title={An Empirical Study of Effectiveness of Post-Processing in Indic Scripts},
  author={V. S. Vinitha and Minesh Mathew and C. V. Jawahar},
  journal={2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)},
  year={2017},
  volume={07},
  pages={32-36}
}
This paper explores the effectiveness of statistical language model (SLM) and dictionary based methods for detection and correction of errors in Indic OCR output. In SLM, we use unicode level ngrams for building the language model. We compare its performance with akshara level ngrams and find that akshara level ngrams perform better in detecting the errors when compared to unicode level ngrams. We experimentally analyze the performance of Indic OCR post-processing using dictionary method… CONTINUE READING

References

Publications referenced by this paper.
SHOWING 1-10 OF 17 REFERENCES

Error Detection in Indic OCRs

  • 2016 12th IAPR Workshop on Document Analysis Systems (DAS)
  • 2016
VIEW 1 EXCERPT

Error Detection in Highly Inflectional Languages

  • 2013 12th International Conference on Document Analysis and Recognition
  • 2013
VIEW 1 EXCERPT

Experiences of integration and performance testing of multilingual ocr for printed indian scripts

D. Arya, T. Patnaik, +5 authors G. S. Lehal
  • ICDAR, 2011.
  • 2011
VIEW 2 EXCERPTS

Limits on the Application of Frequency-Based Language Models to OCR

  • 2011 International Conference on Document Analysis and Recognition
  • 2011
VIEW 1 EXCERPT

An overview of the tesseract ocr engine

R. Smith
  • 2007.
  • 2007
VIEW 1 EXCERPT

Similar Papers

Loading similar papers…