SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation
- Daniel Matthew Cer, Mona T. Diab, Eneko Agirre, I. Lopez-Gazpio, Lucia Specia
- Computer Science, PsychologyInternational Workshop on Semantic Evaluation
- 31 July 2017
The STS Benchmark is introduced as a new shared training and evaluation set carefully selected from the corpus of English STS shared task data (2012-2017), providing insight into the limitations of existing models.
A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings
- Mikel Artetxe, Gorka Labaka, Eneko Agirre
- Computer ScienceAnnual Meeting of the Association for…
- 16 May 2018
This work proposes an alternative approach based on a fully unsupervised initialization that explicitly exploits the structural similarity of the embeddings, and a robust self-learning algorithm that iteratively improves this solution.
A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches
- Eneko Agirre, Enrique Alfonseca, Keith B. Hall, Jana Kravalova, Marius Pasca, Aitor Soroa Etxabe
- Computer ScienceNorth American Chapter of the Association for…
- 31 May 2009
This paper presents and compares WordNet-based and distributional similarity approaches, and pioneer cross-lingual similarity, showing that the methods are easily adapted for a cross-lingsual task with minor losses.
Unsupervised Neural Machine Translation
- Mikel Artetxe, Gorka Labaka, Eneko Agirre, Kyunghyun Cho
- Computer ScienceInternational Conference on Learning…
- 30 October 2017
This work proposes a novel method to train an NMT system in a completely unsupervised manner, relying on nothing but monolingual corpora, and consists of a slightly modified attentional encoder-decoder model that can be trained on monolingUAL corpora alone using a combination of denoising and backtranslation.
Personalizing PageRank for Word Sense Disambiguation
- Eneko Agirre, Aitor Soroa Etxabe
- Computer ScienceConference of the European Chapter of the…
- 30 March 2009
This paper proposes a new graph-based method that uses the knowledge in a LKB (based on WordNet) in order to perform unsupervised Word Sense Disambiguation, performing better than previous approaches in English all-words datasets.
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
- Eneko Agirre, Daniel Matthew Cer, Mona T. Diab, A. Gonzalez-Agirre
- Computer Science, PsychologyInternational Workshop on Semantic Evaluation
- 7 June 2012
The results of the STS pilot task in Semeval open an exciting way ahead, although there are still open issues, specially the evaluation metric.
Learning bilingual word embeddings with (almost) no bilingual data
- Mikel Artetxe, Gorka Labaka, Eneko Agirre
- Computer ScienceAnnual Meeting of the Association for…
- 1 July 2017
This work further reduces the need of bilingual resources using a very simple self-learning approach that can be combined with any dictionary-based mapping technique, and works with as little bilingual evidence as a 25 word dictionary or even an automatically generated list of numerals.
Learning principled bilingual mappings of word embeddings while preserving monolingual invariance
- Mikel Artetxe, Gorka Labaka, Eneko Agirre
- Computer ScienceConference on Empirical Methods in Natural…
- 1 November 2016
This paper proposes a framework that generalizes previous work, provides an efficient exact method to learn the optimal linear transformation and yields the best bilingual results in translation induction while preserving monolingual performance in an analogy task.
Generalizing and Improving Bilingual Word Embedding Mappings with a Multi-Step Framework of Linear Transformations
- Mikel Artetxe, Gorka Labaka, Eneko Agirre
- Computer ScienceAAAI Conference on Artificial Intelligence
- 27 April 2018
A multi-step framework of linear transformations that generalizes a substantial body of previous work is proposed that allows new insights into the behavior of existing methods, including the effectiveness of inverse regression, and design a novel variant that obtains the best published results in zero-shot bilingual lexicon extraction.
Unsupervised Statistical Machine Translation
- Mikel Artetxe, Gorka Labaka, Eneko Agirre
- Computer ScienceConference on Empirical Methods in Natural…
- 1 September 2018
This paper proposes an alternative approach based on phrase-based Statistical Machine Translation (SMT) that significantly closes the gap with supervised systems, and profits from the modular architecture of SMT.
...
...