• Corpus ID: 15181045

DISCO: A Multilingual Database of Distributionally Similar Words

  title={DISCO: A Multilingual Database of Distributionally Similar Words},
  author={Peter Kolb},
This paper 1 presents DISCO, a tool for retrieving the distributional similarity be- tween two given words, and for retrieving the distributionally most similar words for a given word. Pre-computed word spaces are freely available for a number of languages including English, German, French and Italian, so DISCO can be used off the shelf. The tool is imple- mented in Java, provides a Java API, and can also be called from the command line. The per- formance of DISCO is evaluated by measuring the… 

Tables from this paper

sranjans : Semantic Textual Similarity using Maximal Weighted Bipartite Graph Matching

The paper aims to come up with a system that examines the degree of semantic equivalence between two sentences by finding the maximal weighted bipartite match between the tokens of the two sentences.

Evaluating Distributional Semantic Models with Russian Noun-Adjective Compositions

In the paper vector-space semantic models based on Word2Vec word embeddings algorithm and a count-based association-oriented algorithm are evaluated and compared by measuring association strength

Estimating syntagmatic association strEngth Using DistribUtional WorD rEprEsEntations

In the paper we present distributed vector space models based on word embeddings and a specific association-oriented count-based distributional algorithm which have been applied to measuring

Semantic Lexicon Induction from Twitter with Pattern Relatedness and Flexible Term Length

This work presents a novel semantic lexicon induction approach that is able to learn new vocabulary from social media and can achieve accuracy as high as 92% in the top 100 learned category members.

Syntactic and semantic classification of verb arguments using dependency-based and rich semantic features

  • F. Elia
  • Computer Science, Linguistics
  • 2016
It is shown that this approach performs well, even with the data sparsity issues that characterize the dataset, and can obtain better results than other system by a margin of about 4% f-score.

Automatic Measurement of Semantic Similarity among Arabic Short Texts

A new method to measure the semantic similarity between short texts by combining semantic distribution and lexical similarity measures to determine the degree of similarity between two words is introduced.

Namelette: a tasteful supporter for creative naming

A system that supports the naming process by exploiting natural language processing and linguistic creativity techniques in a completely unsupervised fashion and generates two types of neologisms based on the category of the service to be named and the properties to be underlined.

Keyphrase-Based Hierarchical Clustering for Arabic Documents

A domain independent approach, which builds a hierarchical meaningful clustering tree that overcomes the problem of high dimensionality of feature vector by representing each document with its keyphrases, and introduced a new similarity measure by taking the common lemma form keyphRases among feature vectors of documents.

Novel Approach towards Arabic Question Similarity Detection

  • Mohammad Daoud
  • Computer Science
    2019 2nd International Conference on new Trends in Computing Sciences (ICTCS)
  • 2019
A rule-based approach that relies on lexical and semantic similarity between questions with the utilization of supervised learning algorithms for automatic detection of Arabic question similarity is proposed and tested.

A survey of semantic relatedness evaluation datasets and procedures

This article gives a comprehensive overview of the evaluation protocols and datasets for semantic relatedness covering both intrinsic and extrinsic approaches.



A Freely Available Automatically Generated Thesaurus of Related Words

A freely available English thesaurus of related words is presented that has been automatically compiled by analyzing the distributional similarities of words in the British National Corpus, which does not require syntactic parsing and therefore can be more easily adapted to other languages.

Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL

This paper presents a simple unsupervised learning algorithm for recognizing synonyms, based on statistical data acquired by querying a Web search engine. The algorithm, called PMI-IR, uses Pointwise

Automatic Identification of Word Translations from Unrelated English and German Corpora

The current study, based on the assumption that there is a correlation between the patterns of word co-occurrences in corpora of different languages, makes a significant improvement to about 72% of word translations identified correctly.

Co-occurrence Retrieval: A Flexible Framework for Lexical Distributional Similarity

A flexible, parameterized framework for calculating distributional similarity is proposed and the problem of finding distributionally similar words is cast as one of co-occurrence retrieval (CR) for which precision and recall can be measured by analogy with the way they are measured in document retrieval.

Evaluating WordNet-based Measures of Lexical Semantic Relatedness

An information-content-based measure proposed by Jiang and Conrath is found superior to those proposed by Hirst and St-Onge, Leacock and Chodorow, Lin, and Resnik, and why distributional similarity is not an adequate proxy for lexical semantic relatedness.

Integrating Semantic Knowledge into Text Similarity and Information Retrieval

It is found that integrating lexical semantic knowledge improves performance for both tasks: ad-hoc information retrieval and text similarity.

Automatic Retrieval and Clustering of Similar Words

A word similarity measure based on the distributional pattern of words allows the automatically constructed thesaurus to be significantly closer to WordNet than Roget Thesaurus is.

UMND1: Unsupervised Word Sense Disambiguation Using Contextual Semantic Relatedness

An unsupervised WordNet-based Word Sense Disambiguation system, which participated (as UMND1) in the SemEval-2007 Coarse-grained English Lexical Sample task, is described.

What's in a Thesaurus?

The experiment shows that pairs of ‘lexicographically close’ meanings are frequently found in different parts of the hierarchy of WordNet 1.5 and a mapping between WordNet senses and the senses of another dictionary.

Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis

A novel approach to the automatic acquisition of taxonomies or concept hierarchies from a text corpus based on Formal Concept Analysis, which model the context of a certain term as a vector representing syntactic dependencies which are automatically acquired from the text corpus with a linguistic parser.