Author pages are created from data sourced from our academic publisher partnerships and public sources.
TakeLab: Systems for Measuring Semantic Text Similarity
- F. Saric, Goran Glavas, M. Karan, J. Snajder, B. D. Basic
- Computer Science
- 7 June 2012
We propose several sentence similarity measures built upon knowledge-based and corpus-based similarity of individual words as well as similarity of dependency parses. Expand
Simplifying Lexical Simplification: Do We Need Simplified Corpora?
We present an unsupervised approach to lexical simplification that makes use of the most recent word vector representations and requires only regular corpora. Expand
How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions
We thoroughly evaluate both supervised and unsupervised CLE models on a large number of language pairs in the BLI task and three downstream tasks, providing new insights concerning the ability of cutting-edge CLE models to support cross-lingual NLP. Expand
Explicit Retrofitting of Distributional Word Vectors
We propose a novel framework for semantic specialization of distributional word vectors using external lexical knowledge in order to better embed some semantic relation. Expand
Unsupervised Text Segmentation Using Semantic Relatedness Graphs
We present a novel unsupervised algorithm for linear text segmentation (TS) that exploits word embeddings and a measure of semantic relatedness of short texts to construct a semantic relateds graph of the document. Expand
Unsupervised Cross-Lingual Information Retrieval Using Monolingual Data Only
We propose a fully unsupervised framework for ad-hoc cross-lingual information retrieval (CLIR) which requires no bilingual data at all. Expand
Event graphs for information retrieval and multi-document summarization
We present a novel event-based document representation model that filters and structures the information about events described in text. Expand
Do We Really Need Fully Unsupervised Cross-Lingual Embeddings?
A series of bilingual lexicon induction (BLI) experiments with 15 diverse languages (210 language pairs) show that fully unsupervised CLWE methods still fail for 87/210 pairs. Expand
Post-Specialisation: Retrofitting Vectors of Words Unseen in Lexical Resources
We propose a novel post-specialisation method that preserves the useful linguistic knowledge for seen words; while b) propagating this external signal to unseen words in order to improve their vector representations as well. Expand
HiEve: A Corpus for Extracting Event Hierarchies from News Stories
In news stories, event mentions denote real-world events of different spatial and temporal granularity. Expand