HLATR: Enhance Multi-stage Text Retrieval with Hybrid List Aware Transformer Reranking

  title={HLATR: Enhance Multi-stage Text Retrieval with Hybrid List Aware Transformer Reranking},
  author={Yanzhao Zhang and Dingkun Long and Guangwei Xu and Pengjun Xie},
Deep pre-trained language models (e,g. BERT) are effective at large-scale text retrieval task. Existing text retrieval systems with state-of-the-art performance usually adopt a retrieve-then-reranking architecture due to the high computational cost of pre-trained language models and the large corpus size. Under such a multi-stage architecture, previous studies mainly focused on optimizing single stage of the framework thus improving the overall retrieval performance. However, how to directly… 

Figures and Tables from this paper


Pre-trained Language Model for Web-scale Retrieval in Baidu Search
The new retrieval system facilitated by pretrained language model (i.e., ERNIE) can largely improve the usability and applicability of the search engine.
Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval
Recent research demonstrates the effectiveness of using fine-tuned language models (LM) for dense retrieval. However, dense retrievers are hard to train, typically requiring heavily engineered
Pre-training Methods in Information Retrieval
An overview of PTMs applied in different components of an IR system, including the retrieval component, the re-ranking component, and other components are presented, and some open challenges are discussed and several promising directions are highlighted.
B-PROP: Bootstrapped Pre-training with Representative Words Prediction for Ad-hoc Retrieval
A bootstrapped pre-training method based on BERT based on the powerful contextual language model BERT to replace the classical unigram language model for the ROP task construction, and re-train BERT itself towards the tailored objective for IR.
RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking
A novel joint training approach for dense passage retrieval and passage reranking is proposed, where the dynamic listwise distillation is introduced, where a unified listwise training approach is designed for both the retriever and the re-ranker.
Condenser: a Pre-training Architecture for Dense Retrieval
This paper proposes to pre-train towards dense encoder with a novel Transformer architecture, Condenser, where LM prediction CONditions on DENSE Representation improves over standard LM by large margins on various text retrieval and similarity tasks.
Multi-View Cross-Lingual Structured Prediction with Minimum Supervision
This paper proposes a multi-view framework, by leveraging a small number of labeled target sentences, to effectively combine multiple source models into an aggregated source view at different granularity levels (language, sentence, or sub-structure), and transfer it to a target view based on a task-specific model.
A Deep Relevance Matching Model for Ad-hoc Retrieval
A novel deep relevance matching model (DRMM) for ad-hoc retrieval that employs a joint deep architecture at the query term level for relevance matching and can significantly outperform some well-known retrieval models as well as state-of-the-art deep matching models.
YES SIR!Optimizing Semantic Space of Negatives with Self-Involvement Ranker
Self-Involvement Ranker (SIR) is a lightweight and general framework for pretrained models, which simplifies the ranking process in industry practice, and can significantly improve the ranking performance of various pre-trained models.
Pyserini: A Python Toolkit for Reproducible Information Retrieval Research with Sparse and Dense Representations
An overview of toolkit features is provided and empirical results that illustrate its effectiveness on two popular ranking tasks are presented, as well as hybrid retrieval that integrates both approaches.