Understanding Performance of Long-Document Ranking Models through Comprehensive Evaluation and Leaderboarding

  title={Understanding Performance of Long-Document Ranking Models through Comprehensive Evaluation and Leaderboarding},
  author={Leonid Boytsov and Tianyi Lin and Fangwei Gao and Yutian Zhao and Jeffrey Huang and Eric Nyberg},
We carry out a comprehensive evaluation of 13 recent models for ranking of long documents using two popular collections (MS MARCO documents and Robust04). Our model zoo includes two specialized Transformer models (such as Longformer) that can process long documents without the need to split them. Along the way, we document several difficulties regarding training and comparing such models. Somewhat surprisingly, we find the simple FirstP baseline (truncating documents to satisfy the input… 

Figures and Tables from this paper



Intra-Document Cascading: Learning to Select Passages for Neural Document Ranking

The proposed Intra-Document Cascaded Ranking Model (IDCM) leads to over 400% lower query latency by providing essentially the same effectiveness as the state-of-the-art BERT-based document ranking models.

CEDR: Contextualized Embeddings for Document Ranking

This work investigates how two pretrained contextualized language models (ELMo and BERT) can be utilized for ad-hoc document ranking and proposes a joint approach that incorporates BERT's classification vector into existing neural models and shows that it outperforms state-of-the-art ad-Hoc ranking baselines.

PARADE: Passage Representation Aggregation for Document Reranking

We present PARADE, an end-to-end Transformer-based model that considers document-level context for document reranking. PARADE leverages passage-level relevance representations to predict a document

Fine-Grained Relevance Annotations for Multi-Task Document Ranking and Question Answering

This work extends the ranked retrieval annotations of the Deep Learning track of TREC 2019 with passage and word level graded relevance annotations for all relevant documents, and presents FiRA: a novel dataset of Fine-Grained Relevance Annotations.

A Systematic Evaluation of Transfer Learning and Pseudo-labeling with BERT-based Ranking Models

A systematic evaluation of transferability of BERT-based neural ranking models across five English datasets finds that training on pseudo-labels can produce a competitive or better model compared to transfer learning, yet it is necessary to improve the stability and/or effectiveness of the few-shot training.

CODER: An efficient framework for improving retrieval through COntextualized Document Embedding Reranking

Evaluating CODER in a large set of experiments on the MS MARCO and TripClick collections, it is shown that the contextual reranking of precomputed document embeddings leads to aSignificant improvement in retrieval performance.

ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT

ColBERT is presented, a novel ranking model that adapts deep LMs (in particular, BERT) for efficient retrieval that is competitive with existing BERT-based models (and outperforms every non-BERT baseline) and enables leveraging vector-similarity indexes for end-to-end retrieval directly from millions of documents.

Pretrained Transformers for Text Ranking: BERT and Beyond

This tutorial provides an overview of text ranking with neural network architectures known as transformers, of which BERT (Bidirectional Encoder Representations from Transformers) is the best-known example, and covers a wide range of techniques.

Passage Re-ranking with BERT

A simple re-implementation of BERT for query-based passage re-ranking on the TREC-CAR dataset and the top entry in the leaderboard of the MS MARCO passage retrieval task, outperforming the previous state of the art by 27% in MRR@10.

Pre-trained Language Model based Ranking in Baidu Search

A novel practice to cost-efficiently summarize the web document and contextualize the resultant summary content with the query using a cheap yet powerful Pyramid-ERNIE architecture and a human-anchored fine-tuning strategy tailored for the online ranking system, aiming to stabilize the ranking signals across various online components.