Interpreting TF-IDF term weights as making relevance decisions

  title={Interpreting TF-IDF term weights as making relevance decisions},
  author={Ho Chung Wu and Robert Wing Pong Luk and Kam-Fai Wong and Kui-Lam Kwok},
  journal={ACM Trans. Inf. Syst.},
  • H. Wu, R. Luk, K. Kwok
  • Published 1 June 2008
  • Computer Science
  • ACM Trans. Inf. Syst.
A novel probabilistic retrieval model is presented. [] Key Method Our novel retrieval model is simplified to a basic ranking formula that directly corresponds to the TF-IDF term weights. In general, we show that the term-frequency factor of the ranking formula can be rendered into different term-frequency factors of existing retrieval systems. In the basic ranking formula, the remaining quantity - log p(&rmacr;|t ∈ d) is interpreted as the probability of randomly picking a nonrelevant usage (denoted by…

Figures and Tables from this paper

Evaluating a Novel Kind of Retrieval Models Based on Relevance Decision Making in a Relevance Feedback Environment

The results of the participation in the relevance feedback track using novel retrieval models that simulate human relevance decision-making are presented and it is found that the Markov random field (MRF) model produces better results than the initial retrieval system.

New document-context term weights and clustering for information retrieval

A novel ‘context-dependent’ term weight, which incorporate information based on the words found in the document-contexts of a term, is studied, which can yield statistically significant improvement in retrieval compared with the traditional weights.

A context‐dependent relevance model

This article introduces an extension of RM in the setting of relevance feedback that makes use of the context information of known relevant and nonrelevant documents to obtain weighted counts of query terms for estimating the document language models.

Binary Independence Language Model in a Relevance Feedback Environment

A new (retrieval) language model, called bin, is proposed, which aims to improve the quality of retrieval models used in search engines.

Investigating Passage-level Relevance and Its Role in Document-level Relevance Judgment

This study helps to better understand how users perceive relevance for a document and inspire the designing of novel ranking models leveraging fine-grain, passage-level relevance signals.

Semantic Sensitive TF-IDF to Determine Word Relevance in Documents

A set of nearly four million documents from health-care social media was collected and was trained in order to draw semantic model and to find the word embeddings, and the features of semantic space were utilized to rearrange the original TF-IDF scores through an iterative solution so as to improve the moderate performance of this algorithm on informal texts.

Building a framework for the probability ranking principle by a family of expected weighted rank

A new principles framework is presented for retrieval evaluation of ranked outputs and shows that the Probability Ranking Principle (PRP) specifies optimal ranking, which may be used to normalize the expected weighted rank of retrieval systems for (summary) performance comparison between systems.

A Match-Transformer Framework for Modeling Diverse Relevance Patterns in Ad-hoc Retrieval

This work proposes a Match-Transformer Framework (MTF) for modeling diverse relevance patterns in the perspective of the query and documents simultaneously, and demonstrates that this approach outperforms most wellknown Neural IR models in ad-hoc retrieval.



A retrospective study of a hybrid document-context based retrieval model

Relevance information: a loss of entropy but a gain for IDF?

The main result is a formal framework uncovering the close relationship of a generalised idf and the BIR model, and a Poisson-based idf is superior to the classical idf, where the superiority is particularly evident for long queries.

On Relevance, Probabilistic Indexing and Information Retrieval

The paper suggests an interpretation of the whole library problem as one where the request is considered as a clue on the basis of which the library system makes a concatenated statistical inference in order to provide as an output an ordered list of those documents which most probably satisfy the information needs of the user.

Using Probabilistic Models of Document Retrieval without Relevance Information

This paper considers the situation where no relevance information is available, that is, at the start of the search, based on a probabilistic model, and proposes strategies for the initial search and an intermediate search.

Probabilistic document-context based relevance feedback with limited relevance judgments

This paper presents a novel relevance feedback (RF) algorithm that uses the probabilistic document-context based retrieval model with limited relevance judgments for document re-ranking to reduce the data scarcity problem and the negative weighting problem.

Probabilistic models of information retrieval based on measuring the divergence from randomness

A framework for deriving probabilistic models of Information Retrieval using term-weighting models obtained in the language model approach by measuring the divergence of the actual term distribution from that obtained under a random process is introduced.

A Linguistically Motivated Probabilistic Model of Information Retrieval

The paper shows that the new probabilistic interpretation of tf×idf term weighting might lead to better understanding of statistical ranking mechanisms, for example by explaining how they relate to coordination level ranking.

Term context models for information retrieval

A model is proposed which assesses the presence of a term in a document not by looking at the actual occurrence of that term, but by a set of non-independent supporting terms, i.e. context, which yields a weighting for terms in documents which is different from and complementary to tf-based methods, and is beneficial for retrieval.

Models for retrieval with probabilistic indexing

  • N. Fuhr
  • Computer Science
    Inf. Process. Manag.
  • 1989

A network approach to probabilistic information retrieval

How probabilistic information retrieval based on document components may be implemented as a feedforward (feedbackward) artificial neural network is shown and performance of feedback improves substantially over no feedback, and further gains are obtained when queries are expanded with terms from the feedback documents.