Language Identification in Multilingual Documents

@inproceedings{Tan2003LanguageII,
  title={Language Identification in Multilingual Documents},
  author={Chew Lim Tan and Peck Yoke Leong and Shoujie He},
  year={2003}
}
Most optical character recognition (OCR) systems can recognize at most a few languages. For large archives of document images that contain different languages, there must be some way to automatically categorize these documents before applying the proper OCR on them. This report presents a research in the identification of English, Chinese, Malay and Tamil in image documents. While most other works in this area focuses on English, European, Chinese and Japanese languages, this research… CONTINUE READING
Highly Cited
This paper has 20 citations. REVIEW CITATIONS
15 Citations
10 References
Similar Papers

Citations

Publications citing this paper.

Similar Papers

Loading similar papers…