• Corpus ID: 119314259

Document Expansion by Query Prediction

@article{Nogueira2019DocumentEB,
  title={Document Expansion by Query Prediction},
  author={Rodrigo Nogueira and Wei Yang and Jimmy J. Lin and Kyunghyun Cho},
  journal={ArXiv},
  year={2019},
  volume={abs/1904.08375}
}
One technique to improve the retrieval effectiveness of a search engine is to expand documents with terms that are related or representative of the documents' content. [...] Key Method Our predictions are made with a vanilla sequence-to-sequence model trained with supervised learning using a dataset of pairs of query and relevant documents. By combining our method with a highly-effective re-ranking component, we achieve the state of the art in two retrieval tasks. In a latency-critical regime, retrieval results…Expand
Strong natural language query generation
TLDR
This paper empirically compared the new approaches with several closely related baselines using the MS-MARCO data collection, and shows that the approach is capable of achieving substantially better trade-off between effectiveness and human-readability than have been reported previously.
Learning Passage Impacts for Inverted Indexes
TLDR
DeepImpact is proposed, a new document term-weighting scheme suitable for efficient retrieval using a standard inverted index that improves impact-score modeling and tackles the vocabulary-mismatch problem.
Matches Made in Heaven: Toolkit and Large-Scale Datasets for Supervised Query Reformulation
TLDR
This paper presents three large-scale query reformulation datasets, namely Diamond, Platinum and Gold datasets, based on the queries in the MS MARCO dataset, which are believed to be the first set of datasets for supervised query reformulating that offers perfect query reformulations for a large number of queries.
Expansion via Prediction of Importance with Contextualization
TLDR
A representation-based ranking approach that explicitly models the importance of each term using a contextualized language model, and performs passage expansion by propagating the importance to similar terms, which narrows the gap between inexpensive and cost-prohibitive passage ranking approaches.
Contextualized Offline Relevance Weighting for Efficient and Effective Neural Retrieval
TLDR
Inspired by the recent advances in transformer-based document expansion technique, this work proposes to trade offline relevance weighting for online retrieval efficiency by utilizing the powerful BERT ranker to weight the neighbour documents collected by generated pseudo-queries for each document.
Efficiency Implications of TermWeighting for Passage Retrieval
Language model pre-training has spurred a great deal of attention for tasks involving natural language understanding, and has been successfully applied to many downstream tasks with impressive
Context-Aware Sentence/Passage Term Importance Estimation For First Stage Retrieval
TLDR
A Deep Contextualized Term Weighting framework that learns to map BERT's contextualized text representations to context-aware term weights for sentences and passages to improve the accuracy of first-stage retrieval algorithms.
Learning To Retrieve: How to Train a Dense Retrieval Model Effectively and Efficiently
TLDR
LTRe teaches the DR model how to retrieve relevant documents from the entire corpus instead of how to rerank a potentially biased sample of documents, and provides more than 170x speed-up in the training process.
Efficiency Implications of Term Weighting for Passage Retrieval
TLDR
This work conducts an investigation of query processing efficiency over DeepCT indexes, revealing how term re-weighting can impact query processing latency, and exploring how DeepCT can be used as a static index pruning technique to accelerate query processing without harming search effectiveness.
CoRT: Complementary Rankings from Transformers
TLDR
It is shown that CoRT significantly increases the candidate recall by complementing BM25 with missing candidates, and it is demonstrated that passage retrieval using CoRT can be realized with surprisingly low latencies.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 28 REFERENCES
Document expansion versus query expansion for ad-hoc retrieval
TLDR
This work investigates the use of document expansion as an alter- native, in which documents are augmented with related terms extracted from the corpus during indexing, and the overheads at query time are small.
Improving the effectiveness of information retrieval with local context analysis
TLDR
A new technique is proposed, called local context analysis, which selects expansion terms based on cooccurrence with the query terms within the top-ranked documents.
Language Model Information Retrieval with Document Expansion
TLDR
This paper constructs a probabilistic neighborhood for each document, and expands the document with its neighborhood information, which provides a more accurate estimation of the document model, thus improves retrieval accuracy.
Contextualized PACRR for Complex Answer Retrieval
TLDR
This work uses a variation of the Position-Aware Convolutional Recurrent Relevance Matching (PACRR) deep neural model to re-rank passages and modifications include an expanded convolutional kernel size, and contextual vectors to capture heading type.
Improving retrieval of short texts through document expansion
TLDR
This work proposes a novel approach to improving information retrieval for short texts based on aggressive document expansion that improves the lexical representation of documents and the ability to let time influence retrieval.
Query expansion using lexical-semantic relations
TLDR
Examination of the utility of lexical query expansion in the large, diverse TREC collection shows this query expansion technique makes little difference in retrieval effectiveness if the original queries are relatively complete descriptions of the information being sought even when the concepts to be expanded are selected by hand.
TREC Complex Answer Retrieval Overview
TLDR
It is seen that combining traditional methods with learning-to-rank can outperform neural methods, even when many training queries are available, in TREC Complex Answer Retrieval.
An Introduction to Neural Information Retrieval
TLDR
The monograph provides a complete picture of neural information retrieval techniques that culminate in supervised neural learning to rank models including deep neural network architectures that are trained end-to-end for ranking tasks.
Reverted indexing for feedback and expansion
TLDR
This paper turns the process around: instead of indexing documents, the authors index query result sets, called a reverted index, which can be used to identify additional documents, or to aid the user in query formulation, selection, and feedback.
UMass at TREC 2004: Novelty and HARD
TLDR
The primary findings for passage retrieval are that document retrieval methods performed better than passage retrieval methods on the passage evaluation metric of binary preference at 12,000 characters, and that clarification forms improved passage retrieval for every retrieval method explored.
...
1
2
3
...