Corpus ID: 4882186

Automatic Language Identification System for Hindi and Magahi

@article{Rani2018AutomaticLI,
  title={Automatic Language Identification System for Hindi and Magahi},
  author={P. Rani and Atul Kr. Ojha and Girish Nath Jha},
  journal={ArXiv},
  year={2018},
  volume={abs/1804.05095}
}
Language identification has become a prerequisite for all kinds of automated text processing systems. In this paper, we present a rule-based language identifier tool for two closely related Indo-Aryan languages: Hindi and Magahi. This system has currently achieved an accuracy of approx 86.34%. We hope to improve this in the future. Automatic identification of languages will be significant in the accuracy of output of Web Crawlers. 
2 Citations
Language model adaptation for language and dialect identification of text
  • 9
  • PDF
Iterative Language Model Adaptation for Indo-Aryan Language Identification
  • 14
  • PDF

References

SHOWING 1-10 OF 14 REFERENCES
Text Based Language Identification System for Indian Languages Following Devanagiri Script
  • 12
Word-length algorithm for language identification of under-resourced languages
  • 15
  • PDF
A Survey of Language Identification Techniques and Applications
  • 25
Developing a POS tagger for Magahi: A Comparative Study
  • 5
  • PDF
Developing LRs for Non-scheduled Indian Languages - A Case of Magahi
  • 4
HindEnCorp - Hindi-English and Hindi-only Corpus for Machine Translation
  • 95
  • PDF
All that is English may be Hindi: Enhancing language identification through automatic ranking of likeliness of word
  • 2017
...
1
2
...