• Publications
  • Influence
TakeLab: Systems for Measuring Semantic Text Similarity
TLDR
The two systems for determining the semantic similarity of short texts submitted to the SemEval 2012 Task 6 ranked in the top 5, for the three overall evaluation metrics used.
Simplifying Lexical Simplification: Do We Need Simplified Corpora?
TLDR
This work presents an unsupervised approach to lexical simplification that makes use of the most recent word vector representations and requires only regular corpora, and is as effective as systems that rely on simplified corpora.
Probing Pretrained Language Models for Lexical Semantics
TLDR
A systematic empirical analysis across six typologically diverse languages and five different lexical tasks indicates patterns and best practices that hold universally, but also point to prominent variations across languages and tasks.
How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions
TLDR
It is empirically demonstrate that the performance of CLE models largely depends on the task at hand and that optimizing CLE models for BLI may hurt downstream performance, and indicates the most robust supervised and unsupervised CLE models.
Unsupervised Text Segmentation Using Semantic Relatedness Graphs
TLDR
A novel unsupervised algorithm for linear text segmentation (TS) that exploits word embeddings and a measure of semantic relatedness of short texts to construct a semanticrelatedness graph of the document.
Explicit Retrofitting of Distributional Word Vectors
TLDR
This work transforms external lexico-semantic relations into training examples which are used to learn an explicit retrofitting model (ER), which allows us to learn a global specialization function and specialize the vectors of words unobserved in the training data as well.
From Zero to Hero: On the Limitations of Zero-Shot Language Transfer with Multilingual Transformers
TLDR
It is demonstrated that the inexpensive few-shot transfer (i.e., additional fine-tuning on a few target-language instances) is surprisingly effective across the board, warranting more research efforts reaching beyond the limiting zero-shot conditions.
HiEve: A Corpus for Extracting Event Hierarchies from News Stories
TLDR
This work presents HiEve, a corpus for recognizing relations of spatiotemporal containment between events, in which the narratives are represented as hierarchies of events based on relations of spatial and temporal containment.
Unsupervised Cross-Lingual Information Retrieval Using Monolingual Data Only
TLDR
This work proposes a fully unsupervised framework for ad-hoc cross-lingual information retrieval (CLIR) which requires no bilingual data at all and believes that the proposed framework is the first step towards development of effective CLIR models for language pairs and domains where parallel data are scarce or non-existent.
...
...