Corpus ID: 237503179

YES SIR!Optimizing Semantic Space of Negatives with Self-Involvement Ranker

  title={YES SIR!Optimizing Semantic Space of Negatives with Self-Involvement Ranker},
  author={Ruizhi Pu and Xinyu Zhang and Ruofei Lai and Zikai Guo and Yinxia Zhang and Hao Jiang and Yongkang Wu and Yantao Jia and Zhicheng Dou and Zhao Cao},
Pre-trained model such as BERT has been proved to be an effective tool for dealing with Information Retrieval (IR) problems. Due to its inspiring performance, it has been widely used to tackle with real-world IR problems such as document ranking. Recently, researchers have found that selecting “hard” rather than ”random” negative samples would be beneficial for fine-tuning pre-trained models on ranking tasks. However, it remains elusive how to leverage hard negative samples in a principled way… Expand

Figures and Tables from this paper


Pre-training Tasks for Embedding-based Large-scale Retrieval
It is shown that the key ingredient of learning a strong embedding-based Transformer model is the set of pre- training tasks, and with adequately designed paragraph-level pre-training tasks, the Transformer models can remarkably improve over the widely-used BM-25 as well as embedding models without Transformers. Expand
Rethink Training of BERT Rerankers in Multi-Stage Retrieval Pipeline
A Localized Contrastive Estimation (LCE) for training rerankers is proposed and it is demonstrated it significantly improves deep two-stage models. Expand
PACRR: A Position-Aware Neural IR Model for Relevance Matching
This work proposes a novel neural IR model named PACRR aiming at better modeling position-dependent interactions between a query and a document and yields better results under multiple benchmarks. Expand
DeepRank: A New Deep Architecture for Relevance Ranking in Information Retrieval
Experiments on both benchmark LETOR dataset and a large scale clickthrough data show that DeepRank can significantly outperform learning to ranking methods, and existing deep learning methods. Expand
Latent Retrieval for Weakly Supervised Open Domain Question Answering
It is shown for the first time that it is possible to jointly learn the retriever and reader from question-answer string pairs and without any IR system, and outperforming BM25 by up to 19 points in exact match. Expand
Learning deep structured semantic models for web search using clickthrough data
A series of new latent semantic models with a deep structure that project queries and documents into a common low-dimensional space where the relevance of a document given a query is readily computed as the distance between them are developed. Expand
Context-Aware Sentence/Passage Term Importance Estimation For First Stage Retrieval
A Deep Contextualized Term Weighting framework that learns to map BERT's contextualized text representations to context-aware term weights for sentences and passages to improve the accuracy of first-stage retrieval algorithms. Expand
Neural Ranking Models for Document Retrieval
This paper compares the proposed models in the literature along different dimensions in order to understand the major contributions and limitations of each model, analyzes the promising neural components, and proposes future research directions. Expand
A Deep Relevance Matching Model for Ad-hoc Retrieval
A novel deep relevance matching model (DRMM) for ad-hoc retrieval that employs a joint deep architecture at the query term level for relevance matching and can significantly outperform some well-known retrieval models as well as state-of-the-art deep matching models. Expand
Convolutional Neural Networks for Soft-Matching N-Grams in Ad-hoc Search
Conv-KNRM uses Convolutional Neural Networks to represent n-grams of various lengths and soft matches them in a unified embedding space and is utilized by the kernel pooling and learning-to-rank layers to generate the final ranking score. Expand