Expansion via Prediction of Importance with Contextualization

@article{MacAvaney2020ExpansionVP,
  title={Expansion via Prediction of Importance with Contextualization},
  author={Sean MacAvaney and Franco Maria Nardini and R. Perego and Nicola Tonellotto and Nazli Goharian and Ophir Frieder},
  journal={Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval},
  year={2020}
}
  • Sean MacAvaney, F. M. Nardini, O. Frieder
  • Published 29 April 2020
  • Computer Science
  • Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
The identification of relevance with little textual context is a primary challenge in passage retrieval. We address this problem with a representation-based ranking approach that: (1) explicitly models the importance of each term using a contextualized language model; (2) performs passage expansion by propagating the importance to similar terms; and (3) grounds the representations in the lexicon, making them interpretable. Passage representations can be pre-computed at index time to reduce… 

Figures and Tables from this paper

Fast Passage Re-ranking with Contextualized Exact Term Matching and Efficient Passage Expansion
TLDR
TILDEv2 is proposed, a new model that stems from the original TILDE but that addresses its limitations, and relies on contextualized exact term matching with expanded passages to become the state-of-the-art passage re-ranking method for CPU-only environments, capable of maintaining query latency below 100ms on commodity hardware.
Learning Passage Impacts for Inverted Indexes
TLDR
DeepImpact is proposed, a new document term-weighting scheme suitable for efficient retrieval using a standard inverted index that improves impact-score modeling and tackles the vocabulary-mismatch problem.
Dealing with Typos for BERT-based Passage Retrieval and Ranking
TLDR
The Dense Retriever (DR) and BERT re-ranker can become robust to typos in queries, resulting in significantly improved effectiveness compared to models trained without appropriately accounting for typos.
TILDE: Term Independent Likelihood moDEl for Passage Re-ranking
TLDR
The novel, BERT-based, Term Independent Likelihood moDEl (TILDE), which ranks documents by both query and document likelihood, and achieves competitive effectiveness coupled with low query latency.
SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval
TLDR
The pooling mechanism is modified, a model solely based on document expansion is benchmarked, and models trained with distillation are introduced, leading to state-of-the-art results on the BEIR benchmark.
Query Embedding Pruning for Dense Retrieval
TLDR
This work is the first to consider efficiency improvements in the context of a dense retrieval approach (namely ColBERT), by pruning query term embeddings that are estimated not to be useful for retrieving relevant documents.
On Single and Multiple Representations in Dense Passage Retrieval
TLDR
It is observed that, while ANCE is more efficient than ColBERT in terms of response time and memory usage, multiple representations are statistically more effective than the single representations for MAP and MRR@10, as well as for definitional queries, and those with complex information needs.
SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking
TLDR
This work presents a new first-stage ranker based on explicit sparsity regularization and a log-saturation effect on term weights, leading to highly sparse representations and competitive results with respect to state-of-the-art dense and sparse methods.
Pretrained Transformers for Text Ranking: BERT and Beyond
TLDR
This tutorial provides an overview of text ranking with neural network architectures known as transformers, of which BERT is the best-known example, and lays out the foundations of pretrained transformers for text ranking.
Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and Pitfalls
TLDR
This article investigates methods for integrating PRF signals into rerankers and dense retrievers based on deep language models, and considers text-based and vector-based PRF approaches, and investigates different ways of combining and scoring relevance signals.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 20 REFERENCES
Context-Aware Sentence/Passage Term Importance Estimation For First Stage Retrieval
TLDR
A Deep Contextualized Term Weighting framework that learns to map BERT's contextualized text representations to context-aware term weights for sentences and passages to improve the accuracy of first-stage retrieval algorithms.
CEDR: Contextualized Embeddings for Document Ranking
TLDR
This work investigates how two pretrained contextualized language models (ELMo and BERT) can be utilized for ad-hoc document ranking and proposes a joint approach that incorporates BERT's classification vector into existing neural models and shows that it outperforms state-of-the-art ad-Hoc ranking baselines.
Document Expansion by Query Prediction
TLDR
A simple method that predicts which queries will be issued for a given document and then expands it with those predictions with a vanilla sequence-to-sequence model, trained using datasets consisting of pairs of query and relevant documents is proposed.
Interpretable & Time-Budget-Constrained Contextualization for Re-Ranking
TLDR
TK (Transformer-Kernel): a neural re-ranking model for ad-hoc search using an efficient contextualization mechanism that achieves the highest effectiveness in comparison to BERT and other re- ranking models is proposed.
Passage Re-ranking with BERT
TLDR
A simple re-implementation of BERT for query-based passage re-ranking on the TREC-CAR dataset and the top entry in the leaderboard of the MS MARCO passage retrieval task, outperforming the previous state of the art by 27% in MRR@10.
ANTIQUE: A Non-factoid Question Answering Benchmark
TLDR
This paper develops and releases a collection of 2,626 open-domain non-factoid questions from a diverse set of categories, and includes a brief analysis of the data as well as baseline results on both classical and neural IR models.
Anserini
TLDR
Anserini is described, an information retrieval toolkit built on Lucene that allows researchers to easily reproduce results with modern bag-of-words ranking models on diverse test collections and demonstrates that Lucene provides a suitable framework for supporting information retrieval research.
From doc2query to docTTTTTquery
TLDR
The setup in this work follows doc2query, but with T5 as the expansion model, and it is found that the top-k sampling decoder produces more effective queries than beam search.
TREC Complex Answer Retrieval Overview
TLDR
It is seen that combining traditional methods with learning-to-rank can outperform neural methods, even when many training queries are available, in TREC Complex Answer Retrieval.
Overview of the TREC 2020 Deep Learning Track
The Deep Learning Track is a new track for TREC 2019, with the goal of studying ad hoc ranking in a large data regime. It is the first track with large human-labeled training sets, introducing two
...
1
2
...