Corpus ID: 237532725

Phrase Retrieval Learns Passage Retrieval, Too

  title={Phrase Retrieval Learns Passage Retrieval, Too},
  author={Jinhyuk Lee and Alexander Wettig and Danqi Chen},
Dense retrieval methods have shown great promise over sparse retrieval methods in a range of NLP problems. Among them, dense phrase retrieval—the most fine-grained retrieval unit—is appealing because phrases can be directly used as the output for question answering and slot filling tasks.1 In this work, we follow the intuition that retrieving phrases naturally entails retrieving larger text blocks and study whether phrase retrieval can serve as the basis for coarse-level retrieval including… 

Figures and Tables from this paper

Multi-Vector Models with Textual Guidance for Fine-Grained Scientific Document Similarity
This work presents ASPIRE, a new scientific document similarity model based on matching finegrained aspects that improves performance on document similarity tasks across four datasets, and presents a fast method that involves matching only single sentence pairs, and a method that makes sparse multiple matches with optimal transport.


Learning Dense Representations of Phrases at Scale
This work shows for the first time that it can learn dense phrase representations alone that achieve much stronger performance in open-domain QA, and directly uses DensePhrases as a dense knowledge base for downstream tasks.
Contextualized Sparse Representations for Real-Time Open-Domain Question Answering
This paper aims to improve the quality of each phrase embedding by augmenting it with a contextualized sparse representation (Sparc) and shows 4%+ improvement in CuratedTREC and SQuAD-Open.
Knowledge Guided Text Retrieval and Reading for Open Domain Question Answering
We introduce an approach for open-domain question answering (QA) that retrieves and reads a passage graph, where vertices are passages of text and edges represent relationships that are derived from
Latent Retrieval for Weakly Supervised Open Domain Question Answering
It is shown for the first time that it is possible to jointly learn the retriever and reader from question-answer string pairs and without any IR system, and outperforming BM25 by up to 19 points in exact match.
Distilling Knowledge from Reader to Retriever for Question Answering
This paper proposes a technique to learn retriever models for downstream tasks, inspired by knowledge distillation, and which does not require annotated pairs of query and documents.
REALM: Retrieval-Augmented Language Model Pre-Training
The effectiveness of Retrieval-Augmented Language Model pre-training (REALM) is demonstrated by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA) and is found to outperform all previous methods by a significant margin, while also providing qualitative benefits such as interpretability and modularity.
Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index
This paper introduces query-agnostic indexable representations of document phrases that can drastically speed up open-domain QA, and introduces dense-sparse phrase encoding, which effectively captures syntactic, semantic, and lexical information of the phrases and eliminates the pipeline filtering of context documents.
Phrase-Indexed Question Answering: A New Challenge for Scalable Document Comprehension
A new modular variant of current question answering tasks is formalized by enforcing complete independence of the document encoder from the question encoder, which leads to a significant scalability advantage since the encoding of the answer candidate phrases in the document can be pre-computed and indexed offline for efficient retrieval.
Revealing the Importance of Semantic Retrieval for Machine Reading at Scale
This work proposes a simple yet effective pipeline system with special consideration on hierarchical semantic retrieval at both paragraph and sentence level, and their potential effects on the downstream task, and illustrates that intermediate semantic retrieval modules are vital for shaping upstream data distribution and providing better data for downstream modeling.
Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text
A novel model is proposed, GRAFT-Net, for extracting answers from a question-specific subgraph containing text and Knowledge Bases entities and relations that is competitive with the state-of-the-art when tested using either KBs or text alone, and vastly outperforms existing methods in the combined setting.