• Publications
  • Influence
TakeLab: Systems for Measuring Semantic Text Similarity
TLDR
We propose several sentence similarity measures built upon knowledge-based and corpus-based similarity of individual words as well as similarity of dependency parses. Expand
  • 211
  • 29
  • PDF
Simplifying Lexical Simplification: Do We Need Simplified Corpora?
TLDR
We present an unsupervised approach to lexical simplification that makes use of the most recent word vector representations and requires only regular corpora. Expand
  • 106
  • 17
  • PDF
How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions
TLDR
We thoroughly evaluate both supervised and unsupervised CLE models on a large number of language pairs in the BLI task and three downstream tasks, providing new insights concerning the ability of cutting-edge CLE models to support cross-lingual NLP. Expand
  • 90
  • 12
  • PDF
Unsupervised Text Segmentation Using Semantic Relatedness Graphs
TLDR
We present a novel unsupervised algorithm for linear text segmentation (TS) that exploits word embeddings and a measure of semantic relatedness of short texts to construct a semantic relateds graph of the document. Expand
  • 39
  • 10
  • PDF
Explicit Retrofitting of Distributional Word Vectors
TLDR
We propose a novel framework for semantic specialization of distributional word vectors using external lexical knowledge in order to better embed some semantic relation. Expand
  • 52
  • 10
  • PDF
Unsupervised Cross-Lingual Information Retrieval Using Monolingual Data Only
TLDR
We propose a fully unsupervised framework for ad-hoc cross-lingual information retrieval (CLIR) which requires no bilingual data at all. Expand
  • 36
  • 5
  • PDF
Event graphs for information retrieval and multi-document summarization
TLDR
We present a novel event-based document representation model that filters and structures the information about events described in text. Expand
  • 55
  • 4
Post-Specialisation: Retrofitting Vectors of Words Unseen in Lexical Resources
TLDR
We propose a novel post-specialisation method that preserves the useful linguistic knowledge for seen words; while b) propagating this external signal to unseen words in order to improve their vector representations as well. Expand
  • 29
  • 4
  • PDF
Adversarial Propagation and Zero-Shot Cross-Lingual Transfer of Word Vector Specialization
TLDR
We propose a novel approach to specializing the full distributional vocabulary. Expand
  • 28
  • 3
  • PDF
Unsupervised Cross-Lingual Scaling of Political Texts
TLDR
Political text scaling aims to linearly order parties and politicians across political dimensions (e.g., left-to-right ideology) based on textual content. Expand
  • 12
  • 3
  • PDF