Word-level Language Identification using CRF: Code-switching Shared Task Report of MSR India System

  title={Word-level Language Identification using CRF: Code-switching Shared Task Report of MSR India System},
  author={Gokul Chittaranjan and Yogarshi Vyas and Kalika Bali and Monojit Choudhury},
We describe a CRF based system for word-level language identification of code-mixed text. Our method uses lexical, contextual, character n-gram, and special character features, and therefore, can easily be replicated across languages. Its performance is benchmarked against the test sets provided by the shared task on code-mixing (Solorio et al., 2014) for four language pairs, namely, EnglishSpanish (En-Es), English-Nepali (En-Ne), English-Mandarin (En-Cn), and Standard Arabic-Arabic (Ar-Ar… CONTINUE READING
Highly Cited
This paper has 39 citations. REVIEW CITATIONS
27 Citations
20 References
Similar Papers


Publications citing this paper.
Showing 1-10 of 27 extracted citations


Publications referenced by this paper.
Showing 1-10 of 20 references

Crf++: Yet another crf toolkit

  • Taku Kudo.
  • http://crfpp.googlecode.com/ svn/trunk/doc/index…
  • 2014
2 Excerpts

The functions of code-switching in facebook interactions

  • Nur Syazwani Halim, Marlyana Maros.
  • Proceedings of the International Conference on…
  • 2014
1 Excerpt

Code-switching in computer-mediated communication

  • Jannis Androutsopoulos.
  • Pragmatics of Computer-mediated Communication…
  • 2013
1 Excerpt

Functions of code-switching in polish and hindi facebook users’ post

  • Marta Dabrowska.
  • Studia Linguistica Universitatis Lagellonicae…
  • 2013
1 Excerpt

Toward web-scale analysis of codeswitching

  • Constantine Lignos, Mitch Marcus.
  • 87th Annual Meeting of the Linguistic Society of…
  • 2013
1 Excerpt

Similar Papers

Loading similar papers…