• Publications
  • Influence
SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation
TLDR
The STS Benchmark is introduced as a new shared training and evaluation set carefully selected from the corpus of English STS shared task data (2012-2017), providing insight into the limitations of existing models.
A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches
TLDR
This paper presents and compares WordNet-based and distributional similarity approaches, and pioneer cross-lingual similarity, showing that the methods are easily adapted for a cross-lingsual task with minor losses.
A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings
TLDR
This work proposes an alternative approach based on a fully unsupervised initialization that explicitly exploits the structural similarity of the embeddings, and a robust self-learning algorithm that iteratively improves this solution.
Unsupervised Neural Machine Translation
TLDR
This work proposes a novel method to train an NMT system in a completely unsupervised manner, relying on nothing but monolingual corpora, and consists of a slightly modified attentional encoder-decoder model that can be trained on monolingUAL corpora alone using a combination of denoising and backtranslation.
Personalizing PageRank for Word Sense Disambiguation
TLDR
This paper proposes a new graph-based method that uses the knowledge in a LKB (based on WordNet) in order to perform unsupervised Word Sense Disambiguation, performing better than previous approaches in English all-words datasets.
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
TLDR
The results of the STS pilot task in Semeval open an exciting way ahead, although there are still open issues, specially the evaluation metric.
Learning bilingual word embeddings with (almost) no bilingual data
TLDR
This work further reduces the need of bilingual resources using a very simple self-learning approach that can be combined with any dictionary-based mapping technique, and works with as little bilingual evidence as a 25 word dictionary or even an automatically generated list of numerals.
Learning principled bilingual mappings of word embeddings while preserving monolingual invariance
TLDR
This paper proposes a framework that generalizes previous work, provides an efficient exact method to learn the optimal linear transformation and yields the best bilingual results in translation induction while preserving monolingual performance in an analogy task.
Generalizing and Improving Bilingual Word Embedding Mappings with a Multi-Step Framework of Linear Transformations
TLDR
A multi-step framework of linear transformations that generalizes a substantial body of previous work is proposed that allows new insights into the behavior of existing methods, including the effectiveness of inverse regression, and design a novel variant that obtains the best published results in zero-shot bilingual lexicon extraction.
Unsupervised Statistical Machine Translation
TLDR
This paper proposes an alternative approach based on phrase-based Statistical Machine Translation (SMT) that significantly closes the gap with supervised systems, and profits from the modular architecture of SMT.
...
...