A Survey of Text Similarity Approaches
@article{Gomaa2013ASO, title={A Survey of Text Similarity Approaches}, author={Wael Hassan Gomaa and Aly A. Fahmy}, journal={International Journal of Computer Applications}, year={2013}, volume={68}, pages={13-18} }
ABSTRACT Measuring the similarity between words, sentences, paragraphs and documents is an important component in various tasks such as information retrieval, document clustering, word-sense disambiguation, automatic essay scoring, short answer grading, machine translation and text summarization. This survey discusses the existing works on text similarity through partitioning them into three approaches; String-based, Corpus-based and Knowledge-based similarities. Furthermore, samples of…
718 Citations
Textual Similarity Measurement Approaches: A Survey
- Computer Science
- 2020
An overview of the textual similarity in the literature is provided and many approaches for measuring textual similarity for Arabic text reviewed and compared in this paper.
Text Similarity Based on Modified LSA Technique
- Computer Science
- 2015
Two approaches are focuses on the problem of the semantic similarities between texts in English language by using Latent Semantic Analysis (LSA) technique, trying to enhance the process of finding the semantic similarity distance between texts and making it more adaptable for both long documents and short sentences.
Short text similarity measurement methods: a review
- Computer ScienceSoft Comput.
- 2021
This paper reviews the research literature on short text similarity (STS) measurement method to classify and give a broad overview of existing techniques, find out its strengths and weaknesses in terms of the domain the independence, language independence, requirement of semantic knowledge, corpus and training data, ability to identify semantic meaning, word order similarity and polysemy.
Measuring Sentences Similarity: A Survey
- Computer ScienceIndian Journal of Science and Technology
- 2019
Word-to-word based, structure based, and vector-based are the most widely used approaches to find sentences similarity, but structure based similarity that measures similarity between sentences structures needs more investigation.
Assessing semantic similarity of texts - Methods and algorithms
- Computer Science
- 2017
The mathematical background of LSA for deriving the meaning of the words in a given text by exploring their co-occurrence is examined and provides for reducing the dimensionality of the document vector space and better capturing the text semantics.
Comparative Study of Techniques used for Word and Sentence Similarity
- Computer Science2021 8th International Conference on Computing for Sustainable Global Development (INDIACom)
- 2021
The approaches to measuring the resemblance of sentences based on the methods implemented are classified into three groups, with word-to-word based, structure-based, and vector-based methods the most frequently used.
SEMANTIC TEXTUAL SIMILARITY USING MACHINE LEARNING ALGORITHMS
- Computer Science
- 2017
Various regression techniques of supervised model used to analyze the impact of syntactic and semantic features in calculating the degree of semantic equivalence between two text fragments, though the sentence pair has different words are described.
A German Corpus for Text Similarity Detection Tasks
- Computer ScienceArXiv
- 2017
A textual German corpus for similarity detection is presented to automatically assess the similarity between a pair of texts and to evaluate different similarity measures, both for whole documents or for individual sentences.
A Comparison of Semantic Similarity Methods for Maximum Human Interpretability
- Computer Science2019 Artificial Intelligence for Transforming Business and Society (AITB)
- 2019
Three different methods that not only focus on the text's words but also incorporates semantic information of texts in their feature vector and computes semantic similarities are presented, which performed best in finding similarities between short news texts.
Performance evaluation of similarity measures on similar and dissimilar text retrieval
- Computer Science2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K)
- 2015
This paper evaluated the performances of eight popular similarity measures on four levels (degree) of textual similarity using a corpus of plagiarised texts, and showed that most of the measures were equal on highly similar texts, with the exception of Euclidean distance and Jensen-Shannon divergence which had poorer performances.
References
SHOWING 1-10 OF 49 REFERENCES
Sentence similarity based on semantic nets and corpus statistics
- Computer ScienceIEEE Transactions on Knowledge and Data Engineering
- 2006
Experiments demonstrate that the proposed method provides a similarity measure that shows a significant correlation to human intuition and can be used in a variety of applications that involve text knowledge representation and discovery.
Corpus-based and Knowledge-based Measures of Text Semantic Similarity
- Computer ScienceAAAI
- 2006
This paper shows that the semantic similarity method out-performs methods based on simple lexical matching, resulting in up to 13% error rate reduction with respect to the traditional vector-based similarity metric.
Semantic text similarity using corpus-based word similarity and string similarity
- Computer ScienceACM Trans. Knowl. Discov. Data
- 2008
We present a method for measuring the semantic similarity of texts using a corpus-based measure of semantic word similarity and a normalized and modified version of the Longest Common Subsequence…
WordNet : an electronic lexical database
- Computer Science
- 2000
The lexical database: nouns in WordNet, Katherine J. Miller a semantic network of English verbs, and applications of WordNet: building semantic concordances are presented.
Second Order Co-occurrence PMI for Determining the Semantic Similarity of Words
- LinguisticsLREC
- 2006
A new corpus-based method, called Second Order Co-occurrencePMI (SOC-PMI), uses Pointwise Mutual Information to sort lists of important neighbor words of the two target words to calculate the relative semantic similarity.
A Wikipedia-Based Multilingual Retrieval Model
- Computer ScienceECIR
- 2008
Results are presented of an extensive analysis that demonstrates the power of this new retrieval model: for a query document d the topically most similar documents from a corpus in another language are properly ranked.
Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL
- Computer ScienceECML
- 2001
This paper presents a simple unsupervised learning algorithm for recognizing synonyms, based on statistical data acquired by querying a Web search engine. The algorithm, called PMI-IR, uses Pointwise…
Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy
- Computer ScienceROCLING/IJCLCLP
- 1997
This paper presents a new approach for measuring semantic similarity/distance between words and concepts. It combines a lexical taxonomy structure with corpus statistical information so that the…
IRIT: Textual Similarity Combining Conceptual Similarity with an N-Gram Comparison Method
- Computer Science*SEMEVAL
- 2012
The participation of the IRIT team to SemEval 2012 Task 6 (Semantic Textual Similarity) consists of a n-gram based comparison method combined with a conceptual similarity measure that uses WordNet to calculate the similarity between a pair of concepts.
Term representation with Generalized Latent Semantic Analysis
- Computer Science
- 2007
This paper presents Generalized Latent Semantic Analysis as a framework for computing semantically motivated term and document vectors and demonstrates that GLSA term vectors efficiently capture semantic relations between terms and outperform related approaches on the synonymy test.