The BBN Byblos Pashto OCR system

  title={The BBN Byblos Pashto OCR system},
  author={Michael Decerbo and Ehry MacRostie and Premkumar Natarajan},
  booktitle={HDP '04},
The BBN Byblos OCR system implements a script-independent methodology for OCR using Hidden Markov Models (HMMs). We have successfully tested the system with Arabic, English, and Chinese documents. In this paper, we describe our recent effort in training the system to perform recognition of documents in Pashto, one of the national languages of Afghanistan. We discuss the availability and characteristics of suitable experimental data and the methods we used to assemble Pashto training and test… CONTINUE READING