• Corpus ID: 245704525

Atomized Search Length: Beyond User Models

@article{Alex2022AtomizedSL,
  title={Atomized Search Length: Beyond User Models},
  author={John Alex and Keith B. Hall and Donald Metzler},
  journal={ArXiv},
  year={2022},
  volume={abs/2201.01745}
}
We argue that current IR metrics, modeled on optimizing user experience, measure too narrow a portion of the IR space. If IR systems are weak, these metrics undersample or completely filter out the deeper documents that need improvement. If IR systems are relatively strong, these metrics undersample deeper relevant documents that could underpin even stronger IR systems, ones that could present content from tens or hundreds of relevant documents in a user-digestible hierarchy or text summary. We… 

Figures and Tables from this paper

Offline Retrieval Evaluation Without Evaluation Metrics

This work proposes recall-paired preference (RPP), a metric-free evaluation method based on directly computing a preference between ranked lists that substantially improves discriminative power while correlating well with existing metrics and being equally robust to incomplete data.

References

SHOWING 1-10 OF 21 REFERENCES

Cumulated gain-based evaluation of IR techniques

This article proposes several novel measures that compute the cumulative gain the user obtains by examining the retrieval result up to a given ranked position, and test results indicate that the proposed measures credit IR methods for their ability to retrieve highly relevant documents and allow testing of statistical significance of effectiveness differences.

Significant Improvements over the State of the Art? A Case Study of the MS MARCO Document Ranking Leaderboard

Empirical analysis of SOTA runs from the MS MARCO document ranking leaderboard reveals insights about how one run can be "significantly better" than another that are obscured by the current official evaluation metric (MRR@100).

Expected reciprocal rank for graded relevance

This work presents a new editorial metric for graded relevance which overcomes this difficulty and implicitly discounts documents which are shown below very relevant documents and calls it Expected Reciprocal Rank (ERR).

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

TREC CAsT 2019: The Conversational Assistance Track Overview

The Conversational Assistance Track (CAsT) is a new track for TREC 2019 to facilitate Conversational Information Seeking (CIS) research and to create a large-scale reusable test collection for

Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering

Interestingly, it is observed that the performance of this method significantly improves when increasing the number of retrieved passages, evidence that sequence-to-sequence models offers a flexible framework to efficiently aggregate and combine evidence from multiple passages.

Rank-biased precision for measurement of retrieval effectiveness

A new effectiveness metric, rank-biased precision, is introduced that is derived from a simple model of user behavior, is robust if answer rankings are extended to greater depths, and allows accurate quantification of experimental uncertainty, even when only partial relevance judgments are available.

An Introduction to Neural Information Retrieval

The monograph provides a complete picture of neural information retrieval techniques that culminate in supervised neural learning to rank models including deep neural network architectures that are trained end-to-end for ranking tasks.

Learning to rank for information retrieval

Three major approaches to learning to rank are introduced, i.e., the pointwise, pairwise, and listwise approaches, the relationship between the loss functions used in these approaches and the widely-used IR evaluation measures are analyzed, and the performance of these approaches on the LETOR benchmark datasets is evaluated.

The TREC-8 Question Answering Track Report

An overtemperature and overcurrent resistor fuse has a relatively low electrical resistance below a selected melting temperature and has an irreversible abrupt increase of electrical resistance above the selected meltingTemperature range caused by serious overload or overheating conditions.