Understanding Performance of Long-Document Ranking Models through Comprehensive Evaluation and Leaderboarding
@article{Boytsov2022UnderstandingPO, title={Understanding Performance of Long-Document Ranking Models through Comprehensive Evaluation and Leaderboarding}, author={Leonid Boytsov and Tianyi Lin and Fangwei Gao and Yutian Zhao and Jeffrey Huang and Eric Nyberg}, journal={ArXiv}, year={2022}, volume={abs/2207.01262} }
We carry out a comprehensive evaluation of 13 recent models for ranking of long documents using two popular collections (MS MARCO documents and Robust04). Our model zoo includes two specialized Transformer models (such as Longformer) that can process long documents without the need to split them. Along the way, we document several difficulties regarding training and comparing such models. Somewhat surprisingly, we find the simple FirstP baseline (truncating documents to satisfy the input…
Figures and Tables from this paper
One Citation
References
SHOWING 1-10 OF 64 REFERENCES
Intra-Document Cascading: Learning to Select Passages for Neural Document Ranking
- Computer ScienceSIGIR
- 2021
The proposed Intra-Document Cascaded Ranking Model (IDCM) leads to over 400% lower query latency by providing essentially the same effectiveness as the state-of-the-art BERT-based document ranking models.
CEDR: Contextualized Embeddings for Document Ranking
- Computer ScienceSIGIR
- 2019
This work investigates how two pretrained contextualized language models (ELMo and BERT) can be utilized for ad-hoc document ranking and proposes a joint approach that incorporates BERT's classification vector into existing neural models and shows that it outperforms state-of-the-art ad-Hoc ranking baselines.
PARADE: Passage Representation Aggregation for Document Reranking
- Computer ScienceArXiv
- 2020
We present PARADE, an end-to-end Transformer-based model that considers document-level context for document reranking. PARADE leverages passage-level relevance representations to predict a document…
Fine-Grained Relevance Annotations for Multi-Task Document Ranking and Question Answering
- Computer ScienceCIKM
- 2020
This work extends the ranked retrieval annotations of the Deep Learning track of TREC 2019 with passage and word level graded relevance annotations for all relevant documents, and presents FiRA: a novel dataset of Fine-Grained Relevance Annotations.
A Systematic Evaluation of Transfer Learning and Pseudo-labeling with BERT-based Ranking Models
- Computer ScienceSIGIR
- 2021
A systematic evaluation of transferability of BERT-based neural ranking models across five English datasets finds that training on pseudo-labels can produce a competitive or better model compared to transfer learning, yet it is necessary to improve the stability and/or effectiveness of the few-shot training.
CODER: An efficient framework for improving retrieval through COntextualized Document Embedding Reranking
- Computer ScienceArXiv
- 2021
Evaluating CODER in a large set of experiments on the MS MARCO and TripClick collections, it is shown that the contextual reranking of precomputed document embeddings leads to aSignificant improvement in retrieval performance.
ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
- Computer ScienceSIGIR
- 2020
ColBERT is presented, a novel ranking model that adapts deep LMs (in particular, BERT) for efficient retrieval that is competitive with existing BERT-based models (and outperforms every non-BERT baseline) and enables leveraging vector-similarity indexes for end-to-end retrieval directly from millions of documents.
Pretrained Transformers for Text Ranking: BERT and Beyond
- Computer ScienceNAACL
- 2021
This tutorial provides an overview of text ranking with neural network architectures known as transformers, of which BERT (Bidirectional Encoder Representations from Transformers) is the best-known example, and covers a wide range of techniques.
Passage Re-ranking with BERT
- Computer ScienceArXiv
- 2019
A simple re-implementation of BERT for query-based passage re-ranking on the TREC-CAR dataset and the top entry in the leaderboard of the MS MARCO passage retrieval task, outperforming the previous state of the art by 27% in MRR@10.
Pre-trained Language Model based Ranking in Baidu Search
- Computer ScienceKDD
- 2021
A novel practice to cost-efficiently summarize the web document and contextualize the resultant summary content with the query using a cheap yet powerful Pyramid-ERNIE architecture and a human-anchored fine-tuning strategy tailored for the online ranking system, aiming to stabilize the ranking signals across various online components.