A Language Identification Method Applied to Twitter Data

  title={A Language Identification Method Applied to Twitter Data},
  author={Anil Kumar Singh and Pratya Goyal},
This paper presents the results of some experiments on using a simple algorithm, aided by a few heuristics, for the purposes of language identification on Twitter data. These experiments were a part of a shared task focused on this problem. The core algorithm is an n-gram based distance metric algorithm. This algorithm has previously been shown to work very well on normal text. The distance metric used is symmetric cross entropy. 

From This Paper

Figures, tables, and topics from this paper.

Similar Papers

Loading similar papers…