• Corpus ID: 24505795

Measuring the sentence level similarity

  title={Measuring the sentence level similarity},
  author={Ercan Canhasi},
This article describes a method used to calculate the similarity between short English texts, specifically of sentence length. The described algorithm calculates semantic and word order similarities of two sentences. In order to do so, it uses a structured lexical knowledge base and statistical information from a corpus. The described method works well in determining sentence similarity for most sentence pairs, consequently the implemented method can be used in computer automated sentence… 

Figures and Tables from this paper

International Journal of Advanced Research in Computer Science and Software Engineering Online Assessment of Similarity between Sentences in Question Analogous System: A Review Paper

This paper surveyed different research papers in order to find various techniques used for calculating similarity between two sentences.

A Novel Approach of Syntactic Similarity of Question Analogous System

The aim of this paper is to present an approach which can be used for find out the similarity between the questions and to be remove the duplicacy with the help of syntactic similarity.

The Role of Surface, Semantic and Grammatical Features on Simplification of Spanish Medical Texts: A User Study

This work conducted a user study on 15 Spanish medical texts using Amazon Mechanical Turk and found that easy Spanish texts use more repeated words and adverbs, less negations and more familiar words, similar to English, while difficult texts contain longer sentences and used grammatical structures that were more varied.


An improved framework of searching with machine learning is proposed which masters the complexity of searching accurate matches and describes a novel functional framework based on searching algorithm with machineLearning both for differentiating intent of query and generate content semantically.

Commercial ERP Chatbots : Conversational Intelligence Agents' Performance Analysis, User experience Benchmarks, and Quality Standards

This study examines several aspects of the function of embodied conversational agents, including their visual appearance, implementation of web sites, speech synthesis unit, built-in knowledge base (with general and specialized information), presentation of knowledge and additional functionality, emergency responses in unexpected situations, and user ratings.



Sentence similarity based on semantic nets and corpus statistics

Experiments demonstrate that the proposed method provides a similarity measure that shows a significant correlation to human intuition and can be used in a variety of applications that involve text knowledge representation and discovery.

Detecting Text Similarity over Short Passages: Exploring Linguistic Feature Combinations via Machine Learning

A new composite similarity metric is presented that combines information from multiple linguistic indicators to measure semantic distance between pairs of small textual units and is evaluated against standard information retrieval techniques, establishing that the new method is more effective in identifying closely related textual units.

WordNet: A Lexical Database for English

WordNet1 provides a more effective combination of traditional lexicographic information and modern computing, and is an online lexical database designed for use under program control.

Improving Similarity Measures for Short Segments of Text

A Web-relevance similarity measure is introduced and it is shown that one can further improve the accuracy of similarity measures by using a machine learning approach.

Producing high-dimensional semantic spaces from lexical co-occurrence

A procedure that processes a corpus of text and produces numeric vectors containing information about its meanings for each word, which provide the basis for a representational model of semantic memory, hyperspace analogue to language (HAL).

WordNet : an electronic lexical database

The lexical database: nouns in WordNet, Katherine J. Miller a semantic network of English verbs, and applications of WordNet: building semantic concordances are presented.

Using Information Content to Evaluate Semantic Similarity in a Taxonomy

This paper presents a new measure of semantic similarity in an IS-A taxonomy, based on the notion of information content, which performs encouragingly well and is significantly better than the traditional edge counting approach.

Latent Semantic Analysis

An introduction to latent semantic analysis

The adequacy of LSA's reflection of human knowledge has been established in a variety of ways, for example, its scores overlap those of humans on standard vocabulary and subject matter tests; it mimics human word sorting and category judgments; it simulates word‐word and passage‐word lexical priming data.

Foundations of statistical natural language processing

This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear and provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations.