Codeswitching language identification using Subword Information Enriched Word Vectors

@inproceedings{Xia2016CodeswitchingLI,
  title={Codeswitching language identification using Subword Information Enriched Word Vectors},
  author={Meng Xuan Xia},
  booktitle={CodeSwitch@EMNLP},
  year={2016}
}
Codeswitching is a widely observed phenomenon among bilingual speakers. By combining subword information enriched word vectors with linear-chain Conditional Random Field, we develop a supervised machine learning model that identifies languages in a English-Spanish codeswitched tweets. Our computational method achieves a tweet-level weighted F1 of 0.83 and a token-level accuracy of 0.949 without using any external resource. The result demonstrates that named entity recognition remains a… CONTINUE READING