Learn More
This paper introduces a new task on Multilingual and Cross-lingual Semantic Word Similarity which measures the semantic similarity of word pairs within and across five languages: English, Farsi, German, Italian and Spanish. High quality datasets were manually curated for the five languages with high inter-annotator agreements (consistently in the 0.9(More)
The semantic representation of individual word senses and concepts is of fundamental importance to several applications in Natural Language Processing. To date, concept modeling techniques have in the main based their representation either on lexicographic resources , such as WordNet, or on encyclope-dic resources, such as Wikipedia. We propose a vector(More)
Semantic representation lies at the core of several applications in Natural Language Processing. However, most existing semantic representation techniques cannot be used effectively for the representation of individual word senses. We put forward a novel multilingual concept representation , called MUFFIN, which not only enables accurate representation of(More)
We present a new framework for an intrinsic evaluation of word vector representations based on the outlier detection task. This task is intended to test the capability of vector space models to create semantic clusters in the space. We carried out a pilot study building a gold standard dataset and the results revealed two important features: human(More)
Despite being one of the most popular tasks in lexical semantics, word similarity has often been limited to the English language. Other languages, even those that are widely spoken such as Span-ish, do not have a reliable word similarity evaluation framework. We put forward robust methodologies for the extension of existing English datasets to other(More)
Lexical taxonomies are graph-like hierarchical structures that provide a formal representation of knowledge. Most knowledge graphs to date rely on is-a (hypernymic) relations as the backbone of their semantic structure. In this paper, we propose a supervised distributional framework for hypernym discovery which operates at the sense level, enabling(More)
Word Sense Disambiguation is a long-standing task in Natural Language Processing , lying at the core of human language understanding. However, the evaluation of automatic systems has been problematic , mainly due to the lack of a reliable evaluation framework. In this paper we develop a unified evaluation framework and analyze the performance of various(More)
Annotation sémantique et validation terminologique en texte intégral en SHS Résumé. Nos travaux se focalisent sur la validation d'occurrences de candidats termes en contexte. Les contextes d'occurrences proviennent d'articles scientifiques des sciences du langage issus du corpus SCIENTEXT 1. Les candidats termes sont identifiés par l'extracteur automatique(More)
In this paper we present BabelDomains, a unified resource which provides lexical items with information about domains of knowledge. We propose an automatic method that uses knowledge from various lexical resources, exploiting both distri-butional and graph-based clues, to accurately propagate domain information. We evaluate our methodology intrinsically on(More)