Corpus ID: 4882186

Automatic Language Identification System for Hindi and Magahi

@article{Rani2018AutomaticLI,
  title={Automatic Language Identification System for Hindi and Magahi},
  author={P. Rani and Atul Kr. Ojha and Girish Nath Jha},
  journal={ArXiv},
  year={2018},
  volume={abs/1804.05095}
}
Language identification has become a prerequisite for all kinds of automated text processing systems. In this paper, we present a rule-based language identifier tool for two closely related Indo-Aryan languages: Hindi and Magahi. This system has currently achieved an accuracy of approx 86.34%. We hope to improve this in the future. Automatic identification of languages will be significant in the accuracy of output of Web Crawlers. 
Language model adaptation for language and dialect identification of text
Iterative Language Model Adaptation for Indo-Aryan Language Identification

References

SHOWING 1-10 OF 14 REFERENCES
Word-length algorithm for language identification of under-resourced languages
Developing a POS tagger for Magahi: A Comparative Study
Developing LRs for Non-scheduled Indian Languages - A Case of Magahi
HindEnCorp - Hindi-English and Hindi-only Corpus for Machine Translation
All that is English may be Hindi: Enhancing language identification through automatic ranking of likeliness of word
  • 2017
...
1
2
...