• Corpus ID: 14638345

Using TF-IDF to Determine Word Relevance in Document Queries

  title={Using TF-IDF to Determine Word Relevance in Document Queries},
  author={Juan Enrique Ramos},
In this paper, we examine the results of applying Term Frequency Inverse Document Frequency (TF-IDF) to determine what words in a corpus of documents might be more favorable to use in a query. As the term implies, TF-IDF calculates values for each word in a document through an inverse proportion of the frequency of the word in a particular document to the percentage of documents the word appears in. Words with high TF-IDF numbers imply a strong relationship with the document they appear in… 

Figures and Tables from this paper

Using Term Frequency - Inverse Document Frequency to find the Relevance of Words in Gujarati Language
  • Tripti Dodiya
  • Computer Science
    International Journal for Research in Applied Science and Engineering Technology
  • 2021
The approach to determine TF-IDF for documents in Gujarati language, which is a morphologically rich Indo-Aryan language, is discussed, based on the method to find the frequency of the words in the document.
Comparative Analysis of IDF Methods to Determine Word Relevance in Web Document
Different derivations of inverse document frequency to measure the weight of terms are discussed and compared and the most famous derivations follows from the Robertson-Spark Jones relevance weight are compared.
Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents
In this paper, the use of TF-IDF stands for (term frequencyinverse document frequency) is discussed in examining the relevance of key-words to documents in corpus in order to verify the findings from executing the algorithm.
Semantic Sensitive TF-IDF to Determine Word Relevance in Documents
A set of nearly four million documents from health-care social media was collected and was trained in order to draw semantic model and to find the word embeddings, and the features of semantic space were utilized to rearrange the original TF-IDF scores through an iterative solution so as to improve the moderate performance of this algorithm on informal texts.
Analysis of TF-IDF Model and its Variant for Document Retrieval
The result shows that TF-IDF model gives the highest precision values with the new corpus dataset, and is carried out to analyze and evaluate the retrieval effectiveness of vector -- space model while using the new data set of FIRE 2011.
POS weighted TF-IDF algorithm and its application for an MOOC search engine
  • Ruilin Xu
  • Computer Science
    2014 International Conference on Audio, Language and Image Processing
  • 2014
An algorithm improved upon the original TF- IDF algorithm - POS Weighted TF-IDF algorithm is proposed, which takes every query term's part of speech (POS) into account and assigns each query term frequency a different weight value according to the POS of that term.
Determining Document Relevance using Keyword Extraction
This paper has proposed a system which successfully fetches the desired documents to user based on query provided to system and is supposed to deliver accurate results for every query given by user combined with less processing time.
The result shows that NP performance of text similarity measures as compared to the stan may offer the necessary insights related to the development ofText similarity applications.
Summarization of financial documents with TF-IDF weighting of multi-word terms
The suggested solution first calculates the Term Frequency-Inverse Document Frequency (TFIDF) weights for all single-word and multiword expressions in the corpus, then finds the sequence of words with a maximum total weight in each document.
Modeling unstructured document using N-gram consecutive and wordnet dictionary
This study combined WordNet and N-gram to overcome both problems of TDC by modifying document features from single term into Polysemy and Synonymity concept, which has improved VSM performance.


Information retrieval as statistical translation
A simple, well motivated model of the document-to-query translation process is proposed, and an algorithm for learning the parameters of this model in an unsupervised manner from a collection of documents is described.
Term-Weighting Approaches in Automatic Text Retrieval
Bridging the lexical chasm: statistical approaches to answer-finding
It is shown that the task of “answer-finding” differs from both document retrieval and tradition question-answering, presenting challenges different from those found in these problems.
Using Linear Algebra for Intelligent Information Retrieval
A lexical match between words in users’ requests and those in or assigned to documents in a database helps retrieve textual materials from scientific databases.
A Statistical Approach to Machine Translation
The application of the statistical approach to translation from French to English and preliminary results are described and the results are given.
Reexamining tf.idf based information retrieval with Genetic Programming
This paper proposes a method to automatically perform a search for new tf.idf like schemes using genetic programming, and the results are evaluated in a simple usage scenario.
Cross-Language Text Retrieval with Three Languages
  • In CS-1997-16,
  • 1997