Disentangling from Babylonian Confusion - Unsupervised Language Identification

@inproceedings{Biemann2005DisentanglingFB,
  title={Disentangling from Babylonian Confusion - Unsupervised Language Identification},
  author={Christian Biemann and Sven Teresniak},
  booktitle={CICLing},
  year={2005}
}
This work presents an unsupervised solution to language identification. The method sorts multilingual text corpora on the basis of sentences into the different languages that are contained and makes no assumptions on the number or size of the monolingual fractions. Evaluation on 7-lingual corpora and bilingual corpora show that the quality of classification is comparable to supervised approaches and works almost error-free from 100 sentences per language on. 
Highly Cited
This paper has 25 citations. REVIEW CITATIONS