CEDR: Contextualized Embeddings for Document Ranking

  title={CEDR: Contextualized Embeddings for Document Ranking},
  author={Sean MacAvaney and Andrew Yates and Arman Cohan and Nazli Goharian},
  journal={Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval},
  • Sean MacAvaney, Andrew Yates, Nazli Goharian
  • Published 15 April 2019
  • Computer Science
  • Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval
Although considerable attention has been given to neural ranking architectures recently, far less attention has been paid to the term representations that are used as input to these models. [] Key Method Furthermore, we propose a joint approach that incorporates BERT's classification vector into existing neural models and show that it outperforms state-of-the-art ad-hoc ranking baselines. We call this joint approach CEDR (Contextualized Embeddings for Document Ranking). We also address practical challenges…

Figures and Tables from this paper

CoRT: Complementary Rankings from Transformers

It is shown that CoRT significantly increases the candidate recall by complementing BM25 with missing candidates, and it is demonstrated that passage retrieval using CoRT can be realized with surprisingly low latencies.

Diagnosing BERT with Retrieval Heuristics

This paper creates diagnostic datasets that each fulfil a retrieval heuristic (both term matching and semantic-based)—to explore what BERT is able to learn, and finds BERT, when applied to a recently released large-scale web corpus with ad-hoc topics, to not adhere to any of the explored axioms.

Evaluating Transformer-Kernel Models at TREC Deep Learning 2020

The TK model family sits between BERT and previous ranking model in terms of the efficiency-effectiveness trade-off, faster than BERT albeit less effective, and confirms the path for new storage saving methods for interpretable ranking models.

CEQE to SQET: A study of contextualized embeddings for query expansion

A new model is introduced, Supervised Contextualized Query Expansion with Transformers (SQET) that performs expansion as a supervised classification task and leverages context in pseudo-relevant results and improves over proven probabilistic pseudo-relevance feedback (PRF) models.

Interpretable & Time-Budget-Constrained Contextualization for Re-Ranking

TK (Transformer-Kernel): a neural re-ranking model for ad-hoc search using an efficient contextualization mechanism that achieves the highest effectiveness in comparison to BERT and other re- ranking models is proposed.

Cross-Domain Sentence Modeling for Relevance Transfer with BERT

This work proposes adapting BERT as a neural re-ranker for document retrieval to achieve large improvements on news articles, and presents an end-to-end document retrieval system that integrates the open-source Anserini information retrieval toolkit.

Comparing Score Aggregation Approaches for Document Retrieval with Pretrained Transformers

This work reproduces three passage score aggregation approaches proposed by Dai and Callan for overcoming the maximum input length limitation of BERT and finds that these BERT variants are not more effective for document retrieval in isolation, but can lead to increased effectiveness when combined with "pre–fine-tuning” on the MS MARCO passage dataset.

Document Ranking with a Pretrained Sequence-to-Sequence Model

Surprisingly, it is found that the choice of target tokens impacts effectiveness, even for words that are closely related semantically, which sheds some light on why the sequence-to-sequence formulation for document ranking is effective.

Effective and practical neural ranking

This dissertation argues that early attempts to improve Information Retrieval tasks were not more successful because they did not properly consider the unique characteristics of IR tasks when designing and training ranking models, and studies approaches for offloading computational cost to index-time, substantially reducing query-time latency.

Exploring Classic and Neural Lexical Translation Models for Information Retrieval: Interpretability, Effectiveness, and Efficiency Benefits

It is shown that adding an interpretable neural Model 1 layer on top of BERT-based contextualized embeddings does not decrease accuracy and/or efficiency; and may overcome the limitation on the maximum sequence length of existing BERT models.



Passage Re-ranking with BERT

A simple re-implementation of BERT for query-based passage re-ranking on the TREC-CAR dataset and the top entry in the leaderboard of the MS MARCO passage retrieval task, outperforming the previous state of the art by 27% in MRR@10.

Deep Relevance Ranking Using Enhanced Document-Query Interactions

Several new models for document relevance ranking are explored, building upon the Deep Relevance Matching Model (DRMM) of Guo et al. (2016), and inspired by PACRR’s convolutional n-gram matching features, but extended in several ways including multiple views of query and document inputs.

End-to-End Neural Ad-hoc Ranking with Kernel Pooling

K-NRM uses a translation matrix that models word-level similarities via word embeddings, a new kernel-pooling technique that uses kernels to extract multi-level soft match features, and a learning-to-rank layer that combines those features into the final ranking score.

Co-PACRR: A Context-Aware Neural IR Model for Ad-hoc Retrieval

This work highlights three potential shortcomings caused by not considering context information and proposes three neural ingredients to address them: a disambiguation component, cascade k-max pooling, and a shuffling combination layer that yields Co-PACER, a novel context-aware neural IR model.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Deep Contextualized Word Representations

A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals.

TREC 2004 Robust Track Experiments Using PIRCS

It is demonstrated in TREC2003 that employing the WWW as an alldomain word-association resource with appropriate filtering can be successful for this Robust Track objective.

A Deep Relevance Matching Model for Ad-hoc Retrieval

A novel deep relevance matching model (DRMM) for ad-hoc retrieval that employs a joint deep architecture at the query term level for relevance matching and can significantly outperform some well-known retrieval models as well as state-of-the-art deep matching models.

GloVe: Global Vectors for Word Representation

A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

Attention is All you Need

A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.