A Linguistic Study on Relevance Modeling in Information Retrieval

  title={A Linguistic Study on Relevance Modeling in Information Retrieval},
  author={Yixing Fan and Jiafeng Guo and Xinyu Ma and Ruqing Zhang and Yanyan Lan and Xueqi Cheng},
  journal={Proceedings of the Web Conference 2021},
Relevance plays a central role in information retrieval (IR), which has received extensive studies starting from the 20th century. The definition and the modeling of relevance has always been critical challenges in both information science and computer science research areas. Along with the debate and exploration on relevance, IR has already become a core task in many real-world applications, such as Web search engines, question answering systems, conversational bots, and so on. While relevance… 

Figures and Tables from this paper

NIR-Prompt: A Multi-task Generalized Neural Information Retrieval Training Framework

Experiments show that NIR-Prompt can improve the generalization of PLMs in NIR for both retrieval and reranking stages compared with baselines and under in- domain multi-task, out-of-domain multi- task, and new task adaptation settings.

B-PROP: Bootstrapped Pre-training with Representative Words Prediction for Ad-hoc Retrieval

A bootstrapped pre-training method based on BERT based on the powerful contextual language model BERT to replace the classical unigram language model for the ROP task construction, and re-train BERT itself towards the tailored objective for IR.

Explainable Information Retrieval: A Survey

This survey categorizes and discusses recent explainability methods developed for different application domains in information retrieval, providing a common framework and unifying perspectives and reflects on the common concern of evaluating explanations and highlights open challenges and opportunities.

Match-Prompt: Improving Multi-task Generalization Ability for Neural Text Matching via Prompt Learning

Experimental results on eighteen public datasets show that Match-Prompt can improve multi- Task generalization capability of PLMs in text matching and yield better in-domain multi-task, out-of-domainmulti-task and new task adaptation performance than multi- task and task-specific models trained by previous fine-tuning paradigm.



Relevance: The Whole History

  • S. Mizzaro
  • Computer Science
    J. Am. Soc. Inf. Sci.
  • 1997
This article presents the history of relevance through an exhaustive review of the literature under seven different aspects (methodological foundations, different kinds of relevance, beyond-topical criteria adopted by users, modes for expression of the relevance judgment, dynamic nature of relevance), and types of document representation.

Semantic Matching in Search

This survey gives a systematic and detailed introduction to newly developed machine learning technologies for query document matching (semantic matching) in search, particularly web search, and focuses on the fundamental problems, as well as the state-of-the-art solutions.

Ranking Relevance in Yahoo Search

This paper introduces three key techniques for base relevance -- ranking functions, semantic matching features and query rewriting, and describes solutions for recency sensitive relevance and location sensitive relevance.

A Comparative Study of Utilizing Topic Models for Information Retrieval

This work shows that topic models are effective for document smoothing, and generally, incorporating topics in the feedback documents for building relevance models can benefit the performance more for queries that have more relevant documents.

Latent Retrieval for Weakly Supervised Open Domain Question Answering

It is shown for the first time that it is possible to jointly learn the retriever and reader from question-answer string pairs and without any IR system, and outperforming BM25 by up to 19 points in exact match.

The notion of relevance (I)

It is found that certain very general difficulties rule out the possibility of defining concepts and their relatedness by the method proposed, and an alternative approach is proposed whose elaboration will form Part II of this article.

Analysis of a very large web search engine query log

It is shown that web users type in short queries, mostly look at the first 10 results only, and seldom modify the query, suggesting that traditional information retrieval techniques may not work well for answering web search requests.

Overview of the TREC 2003 Robust Retrieval Track

The robust retrieval track is a new track in TREC 2003 to improve the consistency of retrieval technology by focusing on poorly performing topics and two new effectiveness measures that focus on the effectiveness of the least-well-performing topics are presented.

MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

This new dataset is aimed to overcome a number of well-known weaknesses of previous publicly available datasets for the same task of reading comprehension and question answering, and is the most comprehensive real-world dataset of its kind in both quantity and quality.

LDA-based document models for ad-hoc retrieval

This paper proposes an LDA-based document model within the language modeling framework, and evaluates it on several TREC collections, and shows that improvements over retrieval using cluster-based models can be obtained with reasonable efficiency.