• Publications
  • Influence
TakeLab: Systems for Measuring Semantic Text Similarity
TLDR
We propose several sentence similarity measures built upon knowledge-based and corpus-based similarity of individual words as well as similarity of dependency parses. Expand
  • 207
  • 30
  • PDF
Simplifying Lexical Simplification: Do We Need Simplified Corpora?
TLDR
We present an unsupervised approach to lexical simplification that makes use of the most recent word vector representations and requires only regular corpora. Expand
  • 102
  • 16
  • PDF
How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions
TLDR
We thoroughly evaluate both supervised and unsupervised CLE models on a large number of language pairs in the BLI task and three downstream tasks, providing new insights concerning the ability of cutting-edge CLE models to support cross-lingual NLP. Expand
  • 78
  • 11
  • PDF
Explicit Retrofitting of Distributional Word Vectors
TLDR
We propose a novel framework for semantic specialization of distributional word vectors using external lexical knowledge in order to better embed some semantic relation. Expand
  • 46
  • 8
  • PDF
Unsupervised Text Segmentation Using Semantic Relatedness Graphs
TLDR
We present a novel unsupervised algorithm for linear text segmentation (TS) that exploits word embeddings and a measure of semantic relatedness of short texts to construct a semantic relateds graph of the document. Expand
  • 34
  • 6
  • PDF
Unsupervised Cross-Lingual Information Retrieval Using Monolingual Data Only
TLDR
We propose a fully unsupervised framework for ad-hoc cross-lingual information retrieval (CLIR) which requires no bilingual data at all. Expand
  • 37
  • 5
  • PDF
Event graphs for information retrieval and multi-document summarization
TLDR
We present a novel event-based document representation model that filters and structures the information about events described in text. Expand
  • 50
  • 4
Do We Really Need Fully Unsupervised Cross-Lingual Embeddings?
TLDR
A series of bilingual lexicon induction (BLI) experiments with 15 diverse languages (210 language pairs) show that fully unsupervised CLWE methods still fail for 87/210 pairs. Expand
  • 26
  • 3
  • PDF
Post-Specialisation: Retrofitting Vectors of Words Unseen in Lexical Resources
TLDR
We propose a novel post-specialisation method that preserves the useful linguistic knowledge for seen words; while b) propagating this external signal to unseen words in order to improve their vector representations as well. Expand
  • 24
  • 3
  • PDF
HiEve: A Corpus for Extracting Event Hierarchies from News Stories
TLDR
In news stories, event mentions denote real-world events of different spatial and temporal granularity. Expand
  • 13
  • 3
  • PDF