Corpus ID: 43251551

Low-resource bilingual lexicon extraction using graph based word embeddings

  title={Low-resource bilingual lexicon extraction using graph based word embeddings},
  author={Ximena Gutierrez-Vasques and Victor Mijangos},
  • Ximena Gutierrez-Vasques, Victor Mijangos
  • Published 2017
  • Computer Science
  • ArXiv
  • In this work we focus on the task of automatically extracting bilingual lexicon for the language pair Spanish-Nahuatl. This is a low-resource setting where only a small amount of parallel corpus is available. Most of the downstream methods do not work well under low-resources conditions. This is specially true for the approaches that use vectorial representations like Word2Vec. Our proposal is to construct bilingual word vectors from a graph. This graph is generated using translation pairs… CONTINUE READING

    Figures, Tables, and Topics from this paper


    A Simple Hybrid Aligner for Generating Lexical Correspondences in Parallel Texts
    • 80
    • PDF
    A statistical view on bilingual lexicon extraction
    • 44
    A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments
    • 48
    • PDF
    Exploiting Similarities among Languages for Machine Translation
    • 1,108
    • Highly Influential
    • PDF
    Multilingual Distributed Representations without Word Alignment
    • 139
    • PDF
    Making Sense of Word Embeddings
    • 94
    • PDF
    Sampling-based Multilingual Alignment
    • 59
    • PDF
    Efficient Estimation of Word Representations in Vector Space
    • 16,407
    • Highly Influential
    • PDF
    Multilingual Models for Compositional Distributed Semantics
    • 281
    • PDF