P3 Ranker: Mitigating the Gaps between Pre-training and Ranking Fine-tuning with Prompt-based Learning and Pre-finetuning

@article{Hu2022P3RM,
  title={P3 Ranker: Mitigating the Gaps between Pre-training and Ranking Fine-tuning with Prompt-based Learning and Pre-finetuning},
  author={Xiaomeng Hu and Shih Yuan Yu and Chenyan Xiong and Zhenghao Liu and Zhiyuan Liu and Geoffrey X. Yu},
  journal={Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval},
  year={2022}
}
  • Xiaomeng Hu, S. Yu, Geoffrey X. Yu
  • Published 4 May 2022
  • Computer Science
  • Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
Compared to other language tasks, applying pre-trained language models (PLMs) for search ranking often requires more nuances and training signals. In this paper, we identify and study the two mismatches between pre-training and ranking fine-tuning: the training schema gap regarding the differences in training objectives and model architectures, and the task knowledge gap considering the discrepancy between the knowledge needed in ranking and that learned during pre-training. To mitigate these… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 33 REFERENCES

B-PROP: Bootstrapped Pre-training with Representative Words Prediction for Ad-hoc Retrieval

TLDR
A bootstrapped pre-training method based on BERT based on the powerful contextual language model BERT to replace the classical unigram language model for the ROP task construction, and re-train BERT itself towards the tailored objective for IR.

Making Pre-trained Language Models Better Few-shot Learners

TLDR
The LM-BFF approach makes minimal assumptions on task resources and domain expertise, and hence constitutes a strong task-agnostic method for few-shot learning.

Document Ranking with a Pretrained Sequence-to-Sequence Model

TLDR
Surprisingly, it is found that the choice of target tokens impacts effectiveness, even for words that are closely related semantically, which sheds some light on why the sequence-to-sequence formulation for document ranking is effective.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

TLDR
This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

Muppet: Massive Multi-task Representations with Pre-Finetuning

TLDR
It is shown that pre-finetuning consistently improves performance for pretrained discriminators and generation models on a wide range of tasks while also significantly improving sample efficiency during fine-tuning, and that large-scale multi-tasking is crucial.

Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work?

TLDR
It is observed that intermediate tasks requiring high-level inference and reasoning abilities tend to work best and that target task performance is strongly correlated with higher-level abilities such as coreference resolution, but it is failed to observe more granular correlations between probing and target taskperformance.

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

TLDR
This paper proposes Pre-training with Representative wOrds Prediction (PROP) for ad-hoc retrieval and shows that PROP can achieve exciting performance under both the zero- and low-resource IR settings.

MarkedBERT: Integrating Traditional IR Cues in Pre-trained Language Models for Passage Retrieval

TLDR
MarkedBERT, a modified version of one of the most popular pre-trained models via language modeling tasks, BERT, is proposed, which integrates exact match signals using a marking technique that locates and highlights Exact Matched query-document terms using marker tokens.

End-to-End Neural Ad-hoc Ranking with Kernel Pooling

TLDR
K-NRM uses a translation matrix that models word-level similarities via word embeddings, a new kernel-pooling technique that uses kernels to extract multi-level soft match features, and a learning-to-rank layer that combines those features into the final ranking score.

The Power of Scale for Parameter-Efficient Prompt Tuning

TLDR
This work explores “prompt tuning”, a simple yet effective mechanism for learning “soft prompts” to condition frozen language models to perform specific downstream tasks, and shows that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning.