Automatic Identification of Closely-related Indian Languages: Resources and Experiments

  title={Automatic Identification of Closely-related Indian Languages: Resources and Experiments},
  author={Ritesh Kumar and Bornini Lahiri and Deepak Alok and Atul Kr. Ojha and Mayank Jain and Abdul Basit and Yogesh Dawer},
In this paper, we discuss an attempt to develop an automatic language identification system for 5 closely-related Indo-Aryan languages of India – Awadhi, Bhojpuri, Braj, Hindi and Magahi. We have compiled a comparable corpora of varying length for these languages from various resources. We discuss the method of creation of these corpora in detail. Using these corpora, a language identification system was developed, which currently gives state-of-the-art accuracy of 96.48 %. We also used these… CONTINUE READING
Recent Discussions
This paper has been referenced on Twitter 6 times over the past 90 days. VIEW TWEETS


Publications referenced by this paper.
Showing 1-10 of 32 references

The Origin and Development of the Bengali Language, 3 vols

  • S. K. Chatterjee
  • 1926
Highly Influential
3 Excerpts

Descriptive Study of Eastern Hindi: A mixed language

  • Kumar, Ritesh, Bornini Lahiri, Deepak Alok
  • Linguistic Ecology of Bihar,
  • 2018
6 Excerpts

Politeness in Online Hindi Texts: Pragmatic and Computational Aspects

  • Kumar, Ritesh
  • Unpublished PhD. Thesis,
  • 2014

Similar Papers

Loading similar papers…