• Corpus ID: 233296016

BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models

  title={BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models},
  author={Nandan Thakur and Nils Reimers and Andreas Ruckl'e and Abhishek Srivastava and Iryna Gurevych},
Existing neural information retrieval (IR) models have often been studied in homogeneous and narrow settings, which has considerably limited insights into their out-of-distribution (OOD) generalization capabilities. To address this, and to facilitate researchers to broadly evaluate the effectiveness of their models, we introduce Benchmarking-IR (BEIR), a robust and heterogeneous evaluation benchmark for information retrieval. We leverage a careful selection of 18 publicly available datasets… 

SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval

The pooling mechanism is modified, a model solely based on document expansion is benchmarked, and models trained with distillation are introduced, leading to state-of-the-art results on the BEIR benchmark.

MS-Shift: An Analysis of MS MARCO Distribution Shifts on Neural Retrieval

This study demonstrates that it is possible to design more controllable distribution shifts as a tool to better understand generalization of IR models, and releases the MS MARCO query subsets, which provide an additional resource to benchmark zero-shot transfer in Information Retrieval.

From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective

This work builds on SPLADE -- a sparse expansion-based retriever -- and shows to which extent it is able to benefit from the same training improvements as dense models, by studying the effect of distillation, hard-negative mining as well as the Pre-trained Language Model initialization.

Toward A Fine-Grained Analysis of Distribution Shifts in MSMARCO

This study demonstrates that it is possible to design distribution shift experiments within the MSMARCO collection, and that the query subsets selected constitute an additional benchmark to better study factors of generalization for various models.

COCO-DR: Combating the Distribution Shift in Zero-Shot Dense Retrieval with Contrastive and Distributionally Robust Learning

A new zero-shot dense retrieval (ZeroDR) method, COCO-DR, to improve the generalization ability of dense retrieval by combating the distribution shifts between source training tasks and target scenarios and improving zero- shot accuracy.

No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrieval

It is shown that the number of parameters and early query-document interaction play a significant role in the generalization ability of retrieval models and is confirmed that in-domain effectiveness is not a good indicator of zero-shot effectiveness.

InPars: Unsupervised Dataset Generation for Information Retrieval

This work harnesses the few-shot capabilities of large pretrained language models as synthetic data generators for IR tasks and shows that models finetuned solely on these synthetic datasets outperform strong baselines such as BM25 as well as recently proposed self-supervised dense retrieval methods.

A Systematic Evaluation of Transfer Learning and Pseudo-labeling with BERT-based Ranking Models

A systematic evaluation of transferability of BERT-based neural ranking models across five English datasets finds that training on pseudo-labels can produce a competitive or better model compared to transfer learning, yet it is necessary to improve the stability and/or effectiveness of the few-shot training.

Towards Unsupervised Dense Information Retrieval with Contrastive Learning

This work explores the limits of contrastive learning as a way to train unsupervised dense retrievers, and shows that it leads to strong retrieval performance on the BEIR benchmark.

Domain Adaptation for Dense Retrieval through Self-Supervision by Pseudo-Relevance Labeling

This paper proposes to use a self-supervision approach in which pseudo-relevance labels are automatically generated on the target domain and combines this approach with knowledge distillation relying on an interaction-based teacher model trained on the source domain.



Mr. TyDi: A Multi-lingual Benchmark for Dense Retrieval

Experiments show that although the effectiveness of mDPR is much lower than BM25, dense representations nevertheless appear to provide valuable relevance signals, improving BM25 results in sparse–dense hybrids.

Embedding-based Zero-shot Retrieval through Query Generation

This work considers the embedding-based two-tower architecture as the neural retrieval model and proposes a novel method for generating synthetic training data for retrieval, which produces remarkable results, significantly outperforming BM25 on 5 out of 6 datasets tested.

Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation

This work proposes a cross-architecture training procedure with a margin focused loss (Margin-MSE), that adapts knowledge distillation to the varying score output distributions of different BERT and non-BERT ranking architectures, and shows that across evaluated architectures it significantly improves their effectiveness without compromising their efficiency.

Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling

This work introduces an efficient topic-aware query and balanced margin sampling technique, called TAS-Balanced, and produces the first dense retriever that outperforms every other method on recall at any cutoff on TREC-DL and allows more resource intensive re-ranking models to operate on fewer passages to improve results further.

ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT

ColBERT is presented, a novel ranking model that adapts deep LMs (in particular, BERT) for efficient retrieval that is competitive with existing BERT-based models (and outperforms every non-BERT baseline) and enables leveraging vector-similarity indexes for end-to-end retrieval directly from millions of documents.

Pretrained Transformers for Text Ranking: BERT and Beyond

This tutorial provides an overview of text ranking with neural network architectures known as transformers, of which BERT (Bidirectional Encoder Representations from Transformers) is the best-known example, and covers a wide range of techniques.

MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on a Massive Scale

The best zero-shot transfer model considerably outperforms in-domain BERT and the previous state of the art on six benchmarks, and is proposed to incorporate self-supervised with supervised multi-task learning on all available source domains.

Document Ranking with a Pretrained Sequence-to-Sequence Model

Surprisingly, it is found that the choice of target tokens impacts effectiveness, even for words that are closely related semantically, which sheds some light on why the sequence-to-sequence formulation for document ranking is effective.

SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer Matching Retrieval

SPARTA achieves new state-of-the-art results across a variety of open-domain question answering tasks in both English and Chinese datasets, including open SQuAD, CMRC and etc.

Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval

Approximate nearest neighbor Negative Contrastive Estimation (ANCE) is presented, a training mechanism that constructs negatives from an Approximate Nearest Neighbor (ANN) index of the corpus, which is parallelly updated with the learning process to select more realistic negative training instances.