A Survey of Text Similarity Approaches

@article{Gomaa2013ASO,
  title={A Survey of Text Similarity Approaches},
  author={Wael Hassan Gomaa and Aly A. Fahmy},
  journal={International Journal of Computer Applications},
  year={2013},
  volume={68},
  pages={13-18}
}
  • W. H. Gomaa, A. Fahmy
  • Published 18 April 2013
  • Computer Science
  • International Journal of Computer Applications
ABSTRACT Measuring the similarity between words, sentences, paragraphs and documents is an important component in various tasks such as information retrieval, document clustering, word-sense disambiguation, automatic essay scoring, short answer grading, machine translation and text summarization. This survey discusses the existing works on text similarity through partitioning them into three approaches; String-based, Corpus-based and Knowledge-based similarities. Furthermore, samples of… 
Measurement of Text Similarity: A Survey
TLDR
This paper systematically combs the research status of similarity measurement, analyzes the advantages and disadvantages of current methods, develops a more comprehensive classification description system of text similarity measurement algorithms, and summarizes the future development direction.
Textual Similarity Measurement Approaches: A Survey
TLDR
An overview of the textual similarity in the literature is provided and many approaches for measuring textual similarity for Arabic text reviewed and compared in this paper.
Text Similarity Based on Modified LSA Technique
TLDR
Two approaches are focuses on the problem of the semantic similarities between texts in English language by using Latent Semantic Analysis (LSA) technique, trying to enhance the process of finding the semantic similarity distance between texts and making it more adaptable for both long documents and short sentences.
Measures to Calculate Semantic Similarity: A Survey
TLDR
Text similarity techniques can be effectively employed for tasks such as text summarization, text classification, redundancy removal, document retrieval, question generation, question answering, etc. if semantic similarity measures are used to determine text similarity.
Short text similarity measurement methods: a review
TLDR
This paper reviews the research literature on short text similarity (STS) measurement method to classify and give a broad overview of existing techniques, find out its strengths and weaknesses in terms of the domain the independence, language independence, requirement of semantic knowledge, corpus and training data, ability to identify semantic meaning, word order similarity and polysemy.
Measuring Sentences Similarity: A Survey
  • M. Farouk
  • Computer Science
    Indian Journal of Science and Technology
  • 2019
TLDR
Word-to-word based, structure based, and vector-based are the most widely used approaches to find sentences similarity, but structure based similarity that measures similarity between sentences structures needs more investigation.
Assessing semantic similarity of texts - Methods and algorithms
TLDR
The mathematical background of LSA for deriving the meaning of the words in a given text by exploring their co-occurrence is examined and provides for reducing the dimensionality of the document vector space and better capturing the text semantics.
Measuring text similarity based on structure and word embedding
  • M. Farouk
  • Computer Science
    Cognitive Systems Research
  • 2020
TLDR
The proposed approach combines different similarity measures in the calculation of sentence similarity and exploits sentence semantic structure to improve the accuracy of the sentence similarity calculation.
Comparative Study of Techniques used for Word and Sentence Similarity
  • Farooq Ahmad, Mohd. Faisal
  • 2021 8th International Conference on Computing for Sustainable Global Development (INDIACom)
  • 2021
This study is intended to analyze the methods used to test resemblance of sentences. For many Natural Language Processing applications such as text grouping, information recovery, brief reaction
SEMANTIC TEXTUAL SIMILARITY USING MACHINE LEARNING ALGORITHMS
Sentence similarity measures plays a key role in text-related research and applications in areas like as text mining, natural language processing, information extraction, etc. Semantic Textual
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 49 REFERENCES
Sentence similarity based on semantic nets and corpus statistics
TLDR
Experiments demonstrate that the proposed method provides a similarity measure that shows a significant correlation to human intuition and can be used in a variety of applications that involve text knowledge representation and discovery.
Corpus-based and Knowledge-based Measures of Text Semantic Similarity
TLDR
This paper shows that the semantic similarity method out-performs methods based on simple lexical matching, resulting in up to 13% error rate reduction with respect to the traditional vector-based similarity metric.
Semantic text similarity using corpus-based word similarity and string similarity
We present a method for measuring the semantic similarity of texts using a corpus-based measure of semantic word similarity and a normalized and modified version of the Longest Common Subsequence
DERI&UPM: Pushing Corpus Based Relatedness to Similarity: Shared Task System Description
TLDR
A significant improvement in calculating the semantic similarity between sentences is shown by the fusion of the knowledge-based similarity measure and the corpus-based relatedness measure against corpus based measure taken alone.
UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures
TLDR
This work uses a simple log-linear regression model, trained on the training data, to combine multiple text similarity measures of varying complexity, which range from simple character and word n-grams and common subsequences to complex features such as Explicit Semantic Analysis vector comparisons and aggregation of word similarity based on lexical-semantic resources.
Experiments on the difference between semantic similarity and relatedness
  • Peter Kolb
  • Computer Science, Mathematics
    NODALIDA
  • 2009
TLDR
This paper experimentally investigates how the choice of context, corpus preprocessing and size, and dimension reduction techniques like singular value decomposition and frequency cutoffs influence the semantic properties of the resulting word spaces.
The Google Similarity Distance
TLDR
A new theory of similarity between words and phrases based on information distance and Kolmogorov complexity is presented, which is applied to construct a method to automatically extract similarity, the Google similarity distance, of Words and phrases from the WWW using Google page counts.
WordNet : an electronic lexical database
TLDR
The lexical database: nouns in WordNet, Katherine J. Miller a semantic network of English verbs, and applications of WordNet: building semantic concordances are presented.
Second Order Co-occurrence PMI for Determining the Semantic Similarity of Words
TLDR
A new corpus-based method, called Second Order Co-occurrencePMI (SOC-PMI), uses Pointwise Mutual Information to sort lists of important neighbor words of the two target words to calculate the relative semantic similarity.
A Wikipedia-Based Multilingual Retrieval Model
TLDR
Results are presented of an extensive analysis that demonstrates the power of this new retrieval model: for a query document d the topically most similar documents from a corpus in another language are properly ranked.
...
1
2
3
4
5
...