Corpus ID: 2785490

Corpus-based and Knowledge-based Measures of Text Semantic Similarity

@inproceedings{Mihalcea2006CorpusbasedAK,
  title={Corpus-based and Knowledge-based Measures of Text Semantic Similarity},
  author={Rada Mihalcea and Courtney Corley and Carlo Strapparava},
  booktitle={AAAI},
  year={2006}
}
This paper presents a method for measuring the semantic similarity of texts, using corpus-based and knowledge-based measures of similarity. Previous work on this problem has focused mainly on either large documents (e.g. text classification, information retrieval) or individual words (e.g. synonymy tests). Given that a large fraction of the information available today, on the Web and elsewhere, consists of short text snippets (e.g. abstracts of scientific documents, imagine captions, product… Expand
Comparable Evaluation of Contemporary Corpus-Based and Knowledge-Based Semantic Similarity Measures of Short Texts
TLDR
It is demonstrated that some of proposed approaches can improve the semantic similarity measurement of short text by extending existing measures with information from the ConceptNet knowledgebase. Expand
Corpus-Based methods for Short Text Similarity
TLDR
A new method is presented, based on Vector Space Model, to capture the contextual behavior, senses and correlation, of terms of terms and it is shown that this method performs better than the baseline method that uses vector based cosine similarity measure. Expand
Texts semantic similarity detection based graph approach
TLDR
A graph based algorithm with specific implementation for similarity identification that makes extensive use of word similarity information extracted from WordNet, and aims to contribute to the order of the words in sentence. Expand
Graph Based Measure of Text Semantic Similarity Using WordNet as a Knowledge Base
TLDR
A new approach of paraphrase identification in order to measuring the semantic similarity of texts is investigated and a Graph algorithm for Similarity identification is presented that makes extensive use of word similarity information extracted from WordNet. Expand
Semantic similarity of short texts
This paper presents a method for measuring the semantic similarity of texts using a corpus based measure of semantic word similarity and a normalized and modified versions of the Longest CommonExpand
Semantic text similarity using corpus-based word similarity and string similarity
We present a method for measuring the semantic similarity of texts using a corpus-based measure of semantic word similarity and a normalized and modified version of the Longest Common SubsequenceExpand
Comparison of the Baseline Knowledge-, Corpus-, and Web-based Similarity Measures for Semantic Relations Extraction
TLDR
This paper compares 21 baseline measures and concludes that existing similarity measures provide significantly different results, both in general performances and in relation distributions, and suggests developing a combined similarity measure. Expand
Combining semantic and term frequency similarities for text clustering
TLDR
The Frequency Google Tri-gram Measure is proposed to assess similarity between documents based on the frequencies of terms in the compared documents as well as the Google n-gram corpus as an additional semantic similarity source and demonstrates that the proposed measure improves significantly the quality of document clustering. Expand
Short Tamil sentence similarity calculation using knowledge-based and corpus-based similarity measures
Sentence similarity calculation plays an important role in text processing-related research. Many unsupervised techniques such as knowledge-based techniques, corpus-based techniques, stringExpand
An Effective TF/IDF-Based Text-to-Text Semantic Similarity Measure for Text Classification
TLDR
A new measure for assessing semantic similarity between texts based on TF/IDF with a new function that aggregates semantic similarities between concepts representing the compared text documents pair-to-pair is proposed. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 34 REFERENCES
Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy
This paper presents a new approach for measuring semantic similarity/distance between words and concepts. It combines a lexical taxonomy structure with corpus statistical information so that theExpand
Using Measures of Semantic Relatedness for Word Sense Disambiguation
TLDR
This paper generalizes the Adapted Lesk Algorithm to a method of word sense disambiguation based on semantic relatedness and finds that the gloss overlaps of AdaptedLesk and the semantic distance measure of Jiang and Conrath (1997) result in the highest accuracy. Expand
Using WordNet to disambiguate word senses for text retrieval
TLDR
The IS-A links define a generalization/specialization hierarchy that is not sufficient to reliably select the correct sense of a noun from the set of fine sense distinctions in WordNet; and missing correct matches because of incorrect sense resolution has a much more deleterious effect on retrieval performance than does making spurious matches. Expand
Automatic Text Structuring and Summarization
TLDR
This study applies the ideas from the automatic link generation research to attack another important problem in text processing—automatic text summarization, and generates intra-document links between passages of a document. Expand
Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL
This paper presents a simple unsupervised learning algorithm for recognizing synonyms, based on statistical data acquired by querying a Web search engine. The algorithm, called PMI-IR, uses PointwiseExpand
Verb Semantics and Lexical Selection
TLDR
This paper will focus on the semantic representation of verbs in computer systems and its impact on lexical selection problems in machine translation (MT), and sees the approach as closely aligned with knowledge-based MT approaches (KBMT), and as a separate component that could be incorporated into existing systems. Expand
Automatic Word Sense Discrimination
TLDR
This paper presents context-group discrimination, a disambiguation algorithm based on clustering that demonstrates good performance of context- group discrimination for a sample of natural and artificial ambiguous words. Expand
An introduction to latent semantic analysis
TLDR
The adequacy of LSA's reflection of human knowledge has been established in a variety of ways, for example, its scores overlap those of humans on standard vocabulary and subject matter tests; it mimics human word sorting and category judgments; it simulates word‐word and passage‐word lexical priming data. Expand
WordNet : an electronic lexical database
TLDR
The lexical database: nouns in WordNet, Katherine J. Miller a semantic network of English verbs, and applications of WordNet: building semantic concordances are presented. Expand
Using Information Content to Evaluate Semantic Similarity in a Taxonomy
TLDR
This paper presents a new measure of semantic similarity in an IS-A taxonomy, based on the notion of information content, which performs encouragingly well and is significantly better than the traditional edge counting approach. Expand
...
1
2
3
4
...