Share This Author
A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings
This work proposes an alternative approach based on a fully unsupervised initialization that explicitly exploits the structural similarity of the embeddings, and a robust self-learning algorithm that iteratively improves this solution.
Unsupervised Neural Machine Translation
This work proposes a novel method to train an NMT system in a completely unsupervised manner, relying on nothing but monolingual corpora, and consists of a slightly modified attentional encoder-decoder model that can be trained on monolingUAL corpora alone using a combination of denoising and backtranslation.
Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond
- Mikel Artetxe, Holger Schwenk
- Computer Science, LinguisticsTransactions of the Association for Computational…
- 26 December 2018
An architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different families and written in 28 different scripts using a single BiLSTM encoder with a shared byte-pair encoding vocabulary for all languages, coupled with an auxiliary decoder and trained on publicly available parallel corpora.
On the Cross-lingual Transferability of Monolingual Representations
This work designs an alternative approach that transfers a monolingual model to new languages at the lexical level and shows that it is competitive with multilingual BERT on standard cross-lingUAL classification benchmarks and on a new Cross-lingual Question Answering Dataset (XQuAD).
Learning bilingual word embeddings with (almost) no bilingual data
This work further reduces the need of bilingual resources using a very simple self-learning approach that can be combined with any dictionary-based mapping technique, and works with as little bilingual evidence as a 25 word dictionary or even an automatically generated list of numerals.
Learning principled bilingual mappings of word embeddings while preserving monolingual invariance
This paper proposes a framework that generalizes previous work, provides an efficient exact method to learn the optimal linear transformation and yields the best bilingual results in translation induction while preserving monolingual performance in an analogy task.
Generalizing and Improving Bilingual Word Embedding Mappings with a Multi-Step Framework of Linear Transformations
A multi-step framework of linear transformations that generalizes a substantial body of previous work is proposed that allows new insights into the behavior of existing methods, including the effectiveness of inverse regression, and design a novel variant that obtains the best published results in zero-shot bilingual lexicon extraction.
Unsupervised Statistical Machine Translation
This paper proposes an alternative approach based on phrase-based Statistical Machine Translation (SMT) that significantly closes the gap with supervised systems, and profits from the modular architecture of SMT.
An Effective Approach to Unsupervised Machine Translation
This paper identifies and addresses several deficiencies of existing unsupervised SMT approaches by exploiting subword information, developing a theoretically well founded un supervised tuning method, and incorporating a joint refinement procedure.
Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings
This paper proposes a new method for this task based on multilingual sentence embeddings, which relies on nearest neighbor retrieval with a hard threshold over cosine similarity, and accounts for the scale inconsistencies of this measure.