Corpus ID: 1227830

Massively Multilingual Word Embeddings

  title={Massively Multilingual Word Embeddings},
  author={Waleed Ammar and George Mulcaire and Yulia Tsvetkov and Guillaume Lample and Chris Dyer and Noah A. Smith},
  • Waleed Ammar, George Mulcaire, +3 authors Noah A. Smith
  • Published 2016
  • Computer Science
  • ArXiv
  • We introduce new methods for estimating and evaluating embeddings of words in more than fifty languages in a single shared embedding space. Our estimation methods, multiCluster and multiCCA, use dictionaries and monolingual data; they do not require parallel data. Our new evaluation method, multiQVEC-CCA, is shown to correlate better than previous ones with two downstream tasks (text categorization and parsing). We also describe a web portal for evaluation that will facilitate further research… CONTINUE READING
    201 Citations
    Massively Multilingual Sparse Word Representations
    Multilingual Training of Crosslingual Word Embeddings
    • 41
    • Highly Influenced
    • PDF
    Unsupervised Multilingual Word Embeddings
    • 77
    • Highly Influenced
    • PDF
    A survey of cross-lingual embedding models
    • 94
    Cross-lingual Models of Word Embeddings: An Empirical Comparison
    • 166
    • PDF
    NORMA: Neighborhood Sensitive Maps for Multilingual Word Embeddings
    • 19
    • PDF
    A Universal Semantic Space
    • 1
    • Highly Influenced
    • PDF
    Automated Generation of Multilingual Clusters for the Evaluation of Distributed Representations
    • 12
    • Highly Influenced
    • PDF


    Translation Invariant Word Embeddings
    • 40
    • Highly Influential
    • PDF
    Multilingual Models for Compositional Distributed Semantics
    • 281
    • PDF
    Evaluation of Word Vector Representations by Subspace Alignment
    • 132
    • PDF
    Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure
    • 219
    • PDF
    Bilingual Word Representations with Monolingual Quality in Mind
    • 272
    • Highly Influential
    • PDF
    Inducing Crosslingual Distributed Representations of Words
    • 327
    • PDF