Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy

This paper presents a new approach for measuring semantic similarity/distance between words and concepts. Specifically, the proposed measure is a combined approach that inherits the edge-based approach of the edge counting scheme, which is then enhanced by the node-based approach of the information content calculation. When tested on a common data set of word pair similarity ratings, the proposed approach outperforms other computational models. It gives the highest correlation value (r = 0.828

