• Publications
  • Influence
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison
TLDR
A unified evaluation framework is developed and the results show that supervised systems clearly outperform knowledge-based models in Word Sense Disambiguation, and a linear classifier trained on conventional local features still proves to be a hard baseline to beat.
Nasari: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities
TLDR
A novel multilingual vector representation, called Nasari, is put forward, which not only enables accurate representation of word senses in different languages, but it also provides two main advantages over existing approaches: high coverage and comparability across languages and linguistic levels.
SemEval-2017 Task 2: Multilingual and Cross-lingual Semantic Word Similarity
TLDR
Results show that systems that combine statistical knowledge from text corpora, in the form of word embeddings, and external knowledge from lexical resources are best performers in both subtasks.
NASARI: a Novel Approach to a Semantically-Aware Representation of Items
TLDR
A vector representation technique that combines the complementary knowledge of both lexicographic and encyclopedic resources, such as Wikipedia, and attains state-of-the-art performance on multiple datasets in two standard benchmarks: word similarity and sense clustering.
TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification
TLDR
This paper proposes a new evaluation framework (TweetEval) consisting of seven heterogeneous Twitter-specific classification tasks, and shows the effectiveness of starting off with existing pre-trained generic language models, and continue training them on Twitter corpora.
Embedding Words and Senses Together via Joint Knowledge-Enhanced Training
TLDR
This work proposes a new model which learns word and sense embeddings jointly and exploits large corpora and knowledge from semantic networks in order to produce a unified vector space of word and senses.
SemEval 2018 Task 2: Multilingual Emoji Prediction
TLDR
This paper describes the results of the first Shared Task on Multilingual Emoji Prediction, organized as part of SemEval 2018, which consists of predicting the most likely emoji to be used along such tweet.
Inducing Relational Knowledge from BERT
TLDR
This work proposes a methodology for distilling relational knowledge from a pre-trained language model that fine-tune a language model to predict whether a given word pair is likely to be an instance of some relation, when given an instantiated template for that relation as input.
From Word to Sense Embeddings: A Survey on Vector Representations of Meaning
TLDR
This survey presents a comprehensive overview of the wide range of techniques in the two main branches of sense representation, i.e., unsupervised and knowledge-based and provides an analysis of four of its important aspects: interpretability, sense granularity, adaptability to different domains and compositionality.
WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations
TLDR
A large-scale Word in Context dataset, called WiC, based on annotations curated by experts, for generic evaluation of context-sensitive representations, and shows that existing models have surpassed the performance ceiling of the standard evaluation dataset for the purpose.
...
1
2
3
4
5
...