A Web Search Engine-Based Approach to Measure Semantic Similarity between Words

  title={A Web Search Engine-Based Approach to Measure Semantic Similarity between Words},
  author={Narendra Pradhan and Kamlesh Kumar Pandey and Rajesh Kumar Sahu},
  journal={International Journal of Advanced Research in Computer and Communication Engineering},
  • Narendra PradhanK. PandeyR. Sahu
  • Published 30 October 2014
  • Computer Science
  • International Journal of Advanced Research in Computer and Communication Engineering
3 Abstract:A web search engine is software code that is designed to search for information on the World Wide Web. The search results are generally presented in a line of results often referred to as search engine results pages (SERP's). The information may be a specialist in web pages, images, information and other types of files. Some search engines also mine data available in data basesor open directories. Semantic similarity or semantic relatedness is a concept whereby a set of documents or… 

Figures from this paper

A technical study on Information Retrieval using web mining techniques

  • G. SrinaganyaJ. Sathiaseelan
  • Computer Science
    2015 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS)
  • 2015
This research paper aims to study about various methods of IR to classify and index the relevant web pages respectively.

A predominant statistical approach to identify semantic similarity of textual documents

The developed predominant tool using statistical approach has been tested by checking the similarity of the assignments submitted by the students to check the integrity of a student and may also be used to identify Plagiarism of documents and to eliminate duplicates in a text repository.

Acquiring Evolving Semantic Relationships for WordNet to Enhance Information Retrieval

This paper undergoes a different perspective that automatically updates an existing lexical ontology uses knowledge resources such as the Wikipedia and the Web search engine and achieves a correlation value of 0.87.

A Survey of Web Search from Web Documents Based On Semantic Ontology Technique

There exists a gap between Web mining and the effectiveness of using Web data. The main reason is that we cannot simply utilize and maintain the discovered knowledge using the traditional

Integrated Search System using Semantic Analysis

An integrated search system formulated by accumulating multiple sources such as local storage, secondary storage and online repositories analyzes the meaning of the query and provides the search result according to the intention of the user through is proper expansion of keyword.

Using a Search Engine-Based Mutually Reinforcing Approach to Assess the Semantic Relatedness of Biomedical Terms

This work proposes the Mutually Reinforcing Lexical Pattern Ranking (ReLPR) algorithm, a method for learning and exploring the lexical patterns of synonym pairs in biomedical text that can explore the correlation between two biomedical terms.

Improved Algorithm For Inferring User Search Goals With Feedback Sessions

A novel approach to infer user search goals by analyzing search engine query logs is proposed and a new criterion ―Classified Average Precision (CAP)‖ is proposed to evaluate the performance of inferringuser search goals.


An Open Domain Question Answering that answers simple Wh-questions using online search has been proposed.

Clustering Techniques and the Similarity Measures used in Clustering: A Survey

The survey of various clustering techniques, the current similarity measures based on distance based clustering, explains the limitations associated with the existing clustering technique and proposes that the combination of the advantages of the existing systems can help overcome the limitations of theexisting systems.



A Web Search Engine-Based Approach to Measure Semantic Similarity between Words

This work proposes an empirical method to estimate semantic similarity using page counts and text snippets retrieved from a web search engine for two words, and proposes a novel pattern extraction algorithm and a pattern clustering algorithm that significantly improves the accuracy in a community mining task.

Unsupervised Semantic Similarity Computation between Terms Using Web Documents

The proposed unsupervised context-based similarity computation algorithms are shown to be competitive with the state-of-the-art supervised semantic similarity algorithms that employ language-specific knowledge resources.

Combining statistical similarity measures for automatic induction of semantic classes

An unsupervised semantic class induction algorithm is proposed that is based on the principle that similarity of context implies similarity of meaning and is evaluated on two corpora: a semantically heterogeneous Web news domain and an application-specific travel reservation corpus.

An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources

This paper explores the determination of semantic similarity by a number of information sources, which consist of structural semantic information from a lexical taxonomy and information content from a corpus.

Document classification based on web search hit counts

A web mining method to classify research documents automatically with a result of k-means clustering method, in which cosine similarity is used to calculate a distance.

Using Information Content to Evaluate Semantic Similarity in a Taxonomy

This paper presents a new measure of semantic similarity in an IS-A taxonomy, based on the notion of information content, which performs encouragingly well and is significantly better than the traditional edge counting approach.

Visual Classifier Training for Text Document Retrieval

This work compares three approaches for interactive classifier training in a user study and sees its work as a step towards introducing user controlled classification methods in addition to text search and filtering for increasing recall in analytics scenarios involving large corpora.

Content-Based Image Retrieval at the End of the Early Years

The working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap are discussed, as well as aspects of system engineering: databases, system architecture, and evaluation.

Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures

Five different proposed measures of similarity or semantic distance in WordNet were experimentally compared by examining their performance in a real-word spelling correction system. It was found that

Exploiting phrasal lexica and additional morpho-syntactic language resources for statistical machine translation with scarce training data

The augmentation of the phrasal lexicon with the help of additional monolingual language resources containing morpho-syntactic information has been investigated for the translation with scarce training material.