Labeling the Languages of Words in Mixed-Language Documents using Weakly Supervised Methods

@inproceedings{King2013LabelingTL,
  title={Labeling the Languages of Words in Mixed-Language Documents using Weakly Supervised Methods},
  author={Ben King and Steven P. Abney},
  booktitle={HLT-NAACL},
  year={2013}
}
In this paper we consider the problem of labeling the languages of words in mixed-language documents. This problem is approached in a weakly supervised fashion, as a sequence labeling problem with monolingual text samples for training data. Among the approaches evaluated, a conditional random field model trained with generalized expectation criteria was the most accurate and performed consistently as the amount of training data was varied. 
Highly Cited
This paper has 124 citations. REVIEW CITATIONS

From This Paper

Figures, tables, and topics from this paper.

Citations

Publications citing this paper.
Showing 1-10 of 89 extracted citations

Mixed Language and Code-Switching in the Canadian Hansard

CodeSwitch@EMNLP • 2014
View 5 Excerpts
Highly Influenced

125 Citations

02040'14'16'18
Citations per Year
Semantic Scholar estimates that this publication has 125 citations based on the available data.

See our FAQ for additional information.

References

Publications referenced by this paper.
Showing 1-10 of 32 references

Mallet: A machine learning for language toolkit

Andrew McCallum.
http://mallet.cs.umass.edu. • 2002
View 4 Excerpts
Highly Influenced

Posterior Regularization for Structured Latent Variable Models

Journal of Machine Learning Research • 2010
View 4 Excerpts
Highly Influenced

Similar Papers

Loading similar papers…