Corpus ID: 15181045

DISCO: A Multilingual Database of Distributionally Similar Words

@inproceedings{Kolb2008DISCOAM,
  title={DISCO: A Multilingual Database of Distributionally Similar Words},
  author={Peter Kolb},
  year={2008}
}
This paper 1 presents DISCO, a tool for retrieving the distributional similarity be- tween two given words, and for retrieving the distributionally most similar words for a given word. Pre-computed word spaces are freely available for a number of languages including English, German, French and Italian, so DISCO can be used off the shelf. The tool is imple- mented in Java, provides a Java API, and can also be called from the command line. The per- formance of DISCO is evaluated by measuring the… Expand
sranjans : Semantic Textual Similarity using Maximal Weighted Bipartite Graph Matching
TLDR
The paper aims to come up with a system that examines the degree of semantic equivalence between two sentences by finding the maximal weighted bipartite match between the tokens of the two sentences. Expand
Evaluating Distributional Semantic Models with Russian Noun-Adjective Compositions
In the paper vector-space semantic models based on Word2Vec word embeddings algorithm and a count-based association-oriented algorithm are evaluated and compared by measuring association strengthExpand
Estimating syntagmatic association strEngth Using DistribUtional WorD rEprEsEntations
In the paper we present distributed vector space models based on word embeddings and a specific association-oriented count-based distributional algorithm which have been applied to measuringExpand
Semantic Lexicon Induction from Twitter with Pattern Relatedness and Flexible Term Length
TLDR
This work presents a novel semantic lexicon induction approach that is able to learn new vocabulary from social media and can achieve accuracy as high as 92% in the top 100 learned category members. Expand
Syntactic and semantic classification of verb arguments using dependency-based and rich semantic features
TLDR
It is shown that this approach performs well, even with the data sparsity issues that characterize the dataset, and can obtain better results than other system by a margin of about 4% f-score. Expand
Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
TLDR
A method for extracting and clustering relevant parts of EULA documents, including permissions, obligations, and prohibitions, based on semantic similarity employing a distributional semantics approach on large word embeddings database is described. Expand
Automatic Measurement of Semantic Similarity among Arabic Short Texts
TLDR
A new method to measure the semantic similarity between short texts by combining semantic distribution and lexical similarity measures to determine the degree of similarity between two words is introduced. Expand
Namelette: a tasteful supporter for creative naming
TLDR
A system that supports the naming process by exploiting natural language processing and linguistic creativity techniques in a completely unsupervised fashion and generates two types of neologisms based on the category of the service to be named and the properties to be underlined. Expand
Keyphrase-Based Hierarchical Clustering for Arabic Documents
TLDR
A domain independent approach, which builds a hierarchical meaningful clustering tree that overcomes the problem of high dimensionality of feature vector by representing each document with its keyphrases, and introduced a new similarity measure by taking the common lemma form keyphRases among feature vectors of documents. Expand
Novel Approach towards Arabic Question Similarity Detection
  • Mohammad Daoud
  • Computer Science
  • 2019 2nd International Conference on new Trends in Computing Sciences (ICTCS)
  • 2019
TLDR
A rule-based approach that relies on lexical and semantic similarity between questions with the utilization of supervised learning algorithms for automatic detection of Arabic question similarity is proposed and tested. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 21 REFERENCES
A Freely Available Automatically Generated Thesaurus of Related Words
TLDR
A freely available English thesaurus of related words is presented that has been automatically compiled by analyzing the distributional similarities of words in the British National Corpus, which does not require syntactic parsing and therefore can be more easily adapted to other languages. Expand
Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL
This paper presents a simple unsupervised learning algorithm for recognizing synonyms, based on statistical data acquired by querying a Web search engine. The algorithm, called PMI-IR, uses PointwiseExpand
Automatic Identification of Word Translations from Unrelated English and German Corpora
TLDR
The current study, based on the assumption that there is a correlation between the patterns of word co-occurrences in corpora of different languages, makes a significant improvement to about 72% of word translations identified correctly. Expand
Co-occurrence Retrieval: A Flexible Framework for Lexical Distributional Similarity
TLDR
A flexible, parameterized framework for calculating distributional similarity is proposed and the problem of finding distributionally similar words is cast as one of co-occurrence retrieval (CR) for which precision and recall can be measured by analogy with the way they are measured in document retrieval. Expand
Evaluating WordNet-based Measures of Lexical Semantic Relatedness
TLDR
An information-content-based measure proposed by Jiang and Conrath is found superior to those proposed by Hirst and St-Onge, Leacock and Chodorow, Lin, and Resnik, and why distributional similarity is not an adequate proxy for lexical semantic relatedness. Expand
Integrating Semantic Knowledge into Text Similarity and Information Retrieval
TLDR
It is found that integrating lexical semantic knowledge improves performance for both tasks: ad-hoc information retrieval and text similarity. Expand
Automatic Retrieval and Clustering of Similar Words
TLDR
A word similarity measure based on the distributional pattern of words allows the automatically constructed thesaurus to be significantly closer to WordNet than Roget Thesaurus is. Expand
UMND1: Unsupervised Word Sense Disambiguation Using Contextual Semantic Relatedness
TLDR
An unsupervised WordNet-based Word Sense Disambiguation system, which participated (as UMND1) in the SemEval-2007 Coarse-grained English Lexical Sample task, is described. Expand
What's in a Thesaurus?
TLDR
The experiment shows that pairs of ‘lexicographically close’ meanings are frequently found in different parts of the hierarchy of WordNet 1.5 and a mapping between WordNet senses and the senses of another dictionary. Expand
Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis
TLDR
A novel approach to the automatic acquisition of taxonomies or concept hierarchies from a text corpus based on Formal Concept Analysis, which model the context of a certain term as a vector representing syntactic dependencies which are automatically acquired from the text corpus with a linguistic parser. Expand
...
1
2
3
...