Document Ranking with a Pretrained Sequence-to-Sequence Model

@inproceedings{Nogueira2020DocumentRW,
  title={Document Ranking with a Pretrained Sequence-to-Sequence Model},
  author={Rodrigo Nogueira and Zhiying Jiang and Ronak Pradeep and Jimmy J. Lin},
  booktitle={FINDINGS},
  year={2020}
}
This work proposes the use of a pretrained sequence-to-sequence model for document ranking. Our approach is fundamentally different from a commonly adopted classification-based formulation based on encoder-only pretrained transformer architectures such as BERT. We show how a sequence-to-sequence model can be trained to generate relevance labels as “target tokens”, and how the underlying logits of these target tokens can be interpreted as relevance probabilities for ranking. Experimental results… Expand
The Expando-Mono-Duo Design Pattern for Text Ranking with Pretrained Sequence-to-Sequence Models
TLDR
A design pattern for tackling text ranking problems, dubbed “Expando-Mono-Duo”, that has been empirically validated for a number of ad hoc retrieval tasks in different domains and is open-sourced in the Pyserini IR toolkit and PyGaggle neural reranking library. Expand
Modeling Relevance Ranking under the Pre-training and Fine-tuning Paradigm
TLDR
A novel ranking framework called Pre-Rank that takes both user’'s view and system’s view into consideration, under the pre-training and fine-tuning paradigm, and can model the relevance by incorporating the relevant knowledge and signals from both real search users and the IR experts. Expand
Pretrained Transformers for Text Ranking: BERT and Beyond
TLDR
This tutorial provides an overview of text ranking with neural network architectures known as transformers, of which BERT is the best-known example, and lays out the foundations of pretrained transformers for text ranking. Expand
Sequence-to-Sequence Learning on Keywords for Efficient FAQ Retrieval
TLDR
TI-S2S is proposed, a novel learning framework combining TF-IDF based keyword extraction and Word2Vec embeddings for training a Sequence-to-Sequence (Seq2Seq) architecture that achieves high precision for FAQ retrieval by better understanding the underlying intent of a user question captured via the representative keywords. Expand
Pretrained Transformers for Text Ranking: BERT and Beyond
TLDR
This tutorial provides an overview of text ranking with neural network architectures known as transformers, of which BERT (Bidirectional Encoder Representations from Transformers) is the best-known example, and covers a wide range of techniques. Expand
Beyond [CLS] through Ranking by Generation
TLDR
This work revisits the generative framework for information retrieval and shows that its generative approaches are as effective as state-of-the-art semantic similarity-based discriminative models for the answer selection task and demonstrates the effectiveness of unlikelihood losses for IR. Expand
Text-to-Text Multi-view Learning for Passage Re-ranking
TLDR
A text-to-text multi-view learning framework is proposed by incorporating an additional view---the text generation view---into a typical single-view passage ranking model, which is of help to the ranking performance compared to its single-View counterpart. Expand
TILDE: Term Independent Likelihood moDEl for Passage Re-ranking
TLDR
The novel, BERT-based, Term Independent Likelihood moDEl (TILDE), which ranks documents by both query and document likelihood, and achieves competitive effectiveness coupled with low query latency. Expand
A Modern Perspective on Query Likelihood with Deep Generative Retrieval Models
TLDR
This work introduces and formalizes the paradigm of deep generative retrieval models defined via the cumulative probabilities of generating query terms, and introduces a novel generative ranker (T-PGN), which combines the encoding capacity of Transformers with the Pointer Generator Network model. Expand
Generalizing Discriminative Retrieval Models using Generative Tasks
TLDR
By targeting the training on the encoding layer in the transformer architecture, the proposed multi-task learning approach consistently improves retrieval effectiveness on the targeted collection and can be re-targeted to new ranking tasks. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 40 REFERENCES
Multi-Stage Document Ranking with BERT
TLDR
This work proposes two variants of BERT, called monoBERT and duoBERT, that formulate the ranking problem as pointwise and pairwise classification, respectively, arranged in a multi-stage ranking architecture to form an end-to-end search system. Expand
CEDR: Contextualized Embeddings for Document Ranking
TLDR
This work investigates how two pretrained contextualized language models (ELMo and BERT) can be utilized for ad-hoc document ranking and proposes a joint approach that incorporates BERT's classification vector into existing neural models and shows that it outperforms state-of-the-art ad-Hoc ranking baselines. Expand
End-to-End Neural Ad-hoc Ranking with Kernel Pooling
TLDR
K-NRM uses a translation matrix that models word-level similarities via word embeddings, a new kernel-pooling technique that uses kernels to extract multi-level soft match features, and a learning-to-rank layer that combines those features into the final ranking score. Expand
Cross-Domain Modeling of Sentence-Level Evidence for Document Retrieval
TLDR
This paper is able to leverage passage-level relevance judgments fortuitously available in other domains to fine-tune BERT models that are able to capture cross-domain notions of relevance, and can be directly used for ranking news articles. Expand
Document Expansion by Query Prediction
TLDR
A simple method that predicts which queries will be issued for a given document and then expands it with those predictions with a vanilla sequence-to-sequence model, trained using datasets consisting of pairs of query and relevant documents is proposed. Expand
Passage Re-ranking with BERT
TLDR
A simple re-implementation of BERT for query-based passage re-ranking on the TREC-CAR dataset and the top entry in the leaderboard of the MS MARCO passage retrieval task, outperforming the previous state of the art by 27% in MRR@10. Expand
MASS: Masked Sequence to Sequence Pre-training for Language Generation
TLDR
This work proposes MAsked Sequence to Sequence pre-training (MASS) for the encoder-decoder based language generation tasks, which achieves the state-of-the-art accuracy on the unsupervised English-French translation, even beating the early attention-based supervised model. Expand
Zero-shot Text Classification With Generative Language Models
TLDR
This work investigates the use of natural language to enable zero-shot model adaptation to new tasks, using text and metadata from social commenting platforms as a source for a simple pretraining task and shows that natural language can serve as simple and powerful descriptors for task adaptation. Expand
Attention is All you Need
TLDR
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. Expand
Unified Language Model Pre-training for Natural Language Understanding and Generation
TLDR
A new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks that compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks. Expand
...
1
2
3
4
...