• Publications
  • Influence
On the Limitations of Unsupervised Bilingual Dictionary Induction
TLDR
We show that a simple trick, exploiting a weak supervision signal from identical words, enables more robust induction, and establish a near-perfect correlation between unsupervised bilingual dictionary induction performance and a previously unexplored graph similarity metric. Expand
  • 141
  • 32
  • PDF
SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity
TLDR
We introduce SimVerb-3500, an evaluation resource that provides human ratings for the similarity of 3,500 verb pairs from the USF free-association database that is unprecedented in both size and coverage. Expand
  • 171
  • 31
  • PDF
Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings
TLDR
We propose a new unified framework for monolingual (MoIR) and cross-lingual information retrieval (CLIR) which relies on the induction of dense real-valued word vectors known as word embeddings from comparable data. Expand
  • 215
  • 28
  • PDF
Semantic Specialization of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints
TLDR
We present Attract-Repel, an algorithm for improving the semantic quality of word vectors by injecting constraints extracted from lexical resources to tune word vector spaces using linguistic information that is difficult to capture with conventional distributional training. Expand
  • 109
  • 25
  • PDF
HyperLex: A Large-Scale Evaluation of Graded Lexical Entailment
TLDR
We introduce HyperLex—a data set and evaluation resource that quantifies the extent of the semantic category membership, that is, type-of relation, also known as hyponymy–hypernymy or lexical entailment (LE) relation between 2,616 concept pairs. Expand
  • 66
  • 20
  • PDF
A Survey of Cross-lingual Word Embedding Models
TLDR
In this survey, we provide a comprehensive typology of cross-lingual word embedding models. Expand
  • 215
  • 19
  • PDF
Skip N-grams and Ranking Functions for Predicting Script Events
TLDR
We design, evaluate and compare different methods for constructing models for event prediction: given a partial chain of events in a script, predict other events that are likely to belong to the script. Expand
  • 72
  • 19
  • PDF
Semantic Specialization of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints
TLDR
We present Attract-Repel, an algorithm for improving the semantic quality of word vectors by injecting constraints extracted from lexical resources. Expand
  • 52
  • 13
How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions
TLDR
We thoroughly evaluate both supervised and unsupervised CLE models on a large number of language pairs in the BLI task and three downstream tasks, providing new insights concerning the ability of cutting-edge CLE models to support cross-lingual NLP. Expand
  • 90
  • 12
  • PDF
Identifying Word Translations from Comparable Corpora Using Latent Topic Models
TLDR
We investigate the value of bilingual topic models, i.e., a bilingual Latent Dirichlet Allocation model for finding translations of terms in comparable corpora without using linguistic resources. Expand
  • 97
  • 12
  • PDF