• Corpus ID: 1359050

Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy

  title={Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy},
  author={Jay J. Jiang and David W. Conrath},
This paper presents a new approach for measuring semantic similarity/distance between words and concepts. [] Key Method Specifically, the proposed measure is a combined approach that inherits the edge-based approach of the edge counting scheme, which is then enhanced by the node-based approach of the information content calculation. When tested on a common data set of word pair similarity ratings, the proposed approach outperforms other computational models. It gives the highest correlation value (r = 0.828…

Figures and Tables from this paper

A New Measure of Word Semantic Similarity Based on WordNet Hierarchy and DAG Theory
This paper presents a new approach to measure the semantic similarity between words in the hierarchy of WordNet that considers not only the semantic distance between two words but also the feature information of the DAG (Directed Acyclic Graph).
Measuring Semantic Textual Similarity of Sentences Using Modified Information Content and Lexical Taxonomy
A new method for measuring semantic similarity between sentences, which uses the advantages of taxonomy methods and merge these information to a language model and generates a similarity score by considering the maximum weight and shortest distance of the graph.
An Improved Semantic Similarity Measure for Word Pairs
  • Songmei Cai, Zhao Lu
  • Computer Science
    2010 International Conference on e-Education, e-Business, e-Management and e-Learning
  • 2010
The correlation value has been achieved between results by the proposed semantic similarity measure and human ratings reported by Miller and Charles for the dataset of 30 pairs of noun, which is higher than some other reported measures for the same dataset.
An Effective Algorithm for Semantic Similarity Metric of Word Pairs
Experiments demonstrate that the proposed algorithm with human judgment significantly outperformed others, and different from previous work, in the new algorithm not only path length, but also IC values have been taken into account.
Calculating the similarity between words and sentences using a lexical database and corpus statistics
The proposed method follows an edge-based approach using a lexical database and gives highest correlation value for both word and sentence similarity outperforming other similar models.
A Semantic Similarity Measure between Nouns based on the Structure of Wordnet
This paper proposes a new semantic similarity measure between two nodes concentrating on nouns as well as their hypernym/hyponym relationships based on the structure of Wordnet that outperforms edge-counting methods.
Measuring Semantic Similarity in the Taxonomy of WordNet
A new model to measure semantic similarity in the taxonomy of WordNet, using edge-counting techniques achieves a much improved result compared with other methods: the correlation with average human judgment on a standard 28 word pair dataset is better than anything reported in the literature.
Multi-word complex concept retrieval via lexical semantic similarity
  • J. Jiang, D. Conrath
  • Computer Science
    Proceedings 1999 International Conference on Information Intelligence and Systems (Cat. No.PR00446)
  • 1999
A simple computational means of measuring universal object similarity that is based on classical feature-based similarity models is presented and extended and applied to a higher level and practical information retrieval task-retrieving multi-word complex concepts.
Semantic Measures based on Wordnet using Multiple Information Sources
This paper investigates a new approach for measuring semantic similarity that combines methods of existing approaches that use different information sources in their similarity calculations namely, shortest path length between compared words, depth in the taxonomy hierarchy, information content, semantic density ofCompared words, and the gloss of words.
An Efficient Computational Method for Measuring Similarity between Two Conceptual Entities
This paper proposes a method for computerized conceptual similarity calculation in WordNet space that provides a degree of conceptual dissimilarity between two concepts and gives a higher correlation value with a criterion based on human similarity judgment.


Using Information Content to Evaluate Semantic Similarity in a Taxonomy
This paper presents a new measure of semantic similarity in an IS-A taxonomy, based on the notion of information content, which performs encouragingly well and is significantly better than the traditional edge counting approach.
Development and application of a metric on semantic nets
Experiments in which distance is applied to pairs of concepts and to sets of concepts in a hierarchical knowledge base show the power of hierarchical relations in representing information about the conceptual distance between concepts.
Similarity between Words Computed by Spreading Activation on an English Dictionary
  • H. Kozima
  • Linguistics, Computer Science
  • 1993
A method for measuring semantic similarity between words as a new tool for text analysis on a semantic network constructed systematically from a subset of the English dictionary, LDOCE (Longman Dictionary of Contemporary English).
WordNet and Distributional Analysis: A Class-based Approach to Lexical Discovery
An estimate of mutual information is used to calculate what nouns a verb can take as its subjects and objects, based on distributions found within a large corpus of naturally occurring text.
Contextual correlates of semantic similarity
Abstract The relationship between semantic and contextual similarity is investigated for pairs of nouns that vary from high to low semantic similarity. Semantic similarity is estimated by subjective
A Proposal for Word Sense Disambiguation using Conceptual Distance
The method relies on the use of the wide-coverage noun taxonomy of WordNet and the notion of conceptual distance among concepts, captured by a Conceptual Density formula developed for this purpose, for the resolution of lexical ambiguity.
Using WordNet in a Knowledge-Based Approach to Information Retrieval
This paper introduces an approach to IR based on computing a semantic distance measurement between concepts or words and using this word distance to compute a similarity between a query and a document.
Word sense disambiguation for free-text indexing using a massive semantic network
This work investigates using the massive Word Net semantic network for disambiguation during document indexing to improve precision and improvement in disamblguation compared with chance.
Use of syntactic context to produce term association lists for text retrieval
When the closest related terms were used in query expansion of a standard information retrieval testbed, the results were much better than that given by document co-occurence techniques, and slightly better than using unexpanded queries, supporting the contention that semantically similar words were indeed extracted by this technique.
A Semantic Concordance
A semantic concordance is a textual corpus and a lexicon so combined that every substantive word in the text is linked to its appropriate sense in the lexicon. Thus it can be viewed either as a