Massively Multilingual Word Embeddings
@article{Ammar2016MassivelyMW, title={Massively Multilingual Word Embeddings}, author={Waleed Ammar and George Mulcaire and Yulia Tsvetkov and Guillaume Lample and Chris Dyer and Noah A. Smith}, journal={ArXiv}, year={2016}, volume={abs/1602.01925} }
We introduce new methods for estimating and evaluating embeddings of words in more than fifty languages in a single shared embedding space. Our estimation methods, multiCluster and multiCCA, use dictionaries and monolingual data; they do not require parallel data. Our new evaluation method, multiQVEC-CCA, is shown to correlate better than previous ones with two downstream tasks (text categorization and parsing). We also describe a web portal for evaluation that will facilitate further research… CONTINUE READING
Supplemental Code
201 Citations
Automated Generation of Multilingual Clusters for the Evaluation of Distributed Representations
- Computer Science
- ICLR
- 2017
- 12
- Highly Influenced
- PDF
Meemi: A Simple Method for Post-processing and Integrating Cross-lingual Word Embeddings.
- Computer Science
- 2020
- 1
- PDF
References
SHOWING 1-10 OF 50 REFERENCES
Improving Vector Space Word Representations Using Multilingual Correlation
- Computer Science
- EACL
- 2014
- 505
- PDF
BilBOWA: Fast Bilingual Distributed Representations without Word Alignments
- Computer Science, Mathematics
- ICML
- 2015
- 329
- PDF
Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure
- Computer Science
- HLT-NAACL
- 2012
- 219
- PDF
Bilingual Word Representations with Monolingual Quality in Mind
- Computer Science
- VS@HLT-NAACL
- 2015
- 272
- Highly Influential
- PDF