Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP

  title={Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP},
  author={O. Khattab and Keshav Santhanam and Xiang Lisa Li and David Leo Wright Hall and Percy Liang and Christopher Potts and Matei A. Zaharia},
Retrieval-augmented in-context learning has emerged as a powerful approach for addressing knowledge-intensive tasks using frozen language models (LM) and retrieval models (RM). Existing work has combined these in simple “retrieve-then-read” pipelines in which the RM retrieves passages that are inserted into the LM prompt. To begin to fully realize the potential of frozen LMs and RMs, we propose D EMONSTRATE – S EARCH –P REDICT (DSP), a framework that relies on passing natural language texts in… 

Figures and Tables from this paper

REPLUG: Retrieval-Augmented Black-Box Language Models

R E P LUG is introduced, a retrieval-augmented language modeling framework that treats the language model (LM) as a black box and augments it with a tuneable retrieval model and can be easily applied to any existing retrieval and language models.

Complex QA and language models hybrid architectures, Survey

This paper identifies key elements augmenting LLM to solve complex questions or problems, using elements such as: hybrid LLM architectures, active human reinforcement learning supervised with AI, prompting adaptation, neuro-symbolic and structured knowledge grounding, program synthesis, iterated decomposition and others.

ThoughtSource: A central hub for large language model reasoning data

The first release of ThoughtSource, a meta-dataset and software library for chain-of-thought (CoT) reasoning, is presented to improve future artificial intelligence systems by facilitating qualitative understanding of CoTs, enabling empirical evaluations, and providing training data.

Guiding Large Language Models via Directional Stimulus Prompting

A new framework, Directional Stimulus Prompting, that uses a tuneable language model (LM) to provide guidance for the black-box frozen large languagemodel (LLM) on downstream tasks to explore directional stimulus that better aligns LLMs with human preferences is introduced.

Iterated Decomposition: Improving Science Q&A by Supervising Reasoning Processes

Language models (LMs) can perform complex reasoning either end-to-end, with hidden latent state, or compositionally, with transparent intermediate state. Composition offers benefits for

UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers

This work develops and motivate a method for using large language models (LLMs) to generate large numbers of synthetic queries cheaply, and shows that this technique boosts zero-shot accuracy in long-tail domains, even where only 2K synthetic queries are used for fine-tuning.

Context-faithful Prompting for Large Language Models

It is demonstrated that LLMs' faithfulness can be significantly improved using carefully designed prompting strategies, and opinion-based prompts and counterfactual demonstrations are identified as the most effective methods.



Recitation-Augmented Language Models

It is shown that by utilizing recitation as the intermediate step, a recite-and-answer scheme can achieve new state-of-the-art performance in various closed-book question answering (CBQA) tasks.

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

A general-purpose fine-tuning recipe for retrieval-augmented generation (RAG) -- models which combine pre-trained parametric and non-parametric memory for language generation, and finds that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.

Internet-augmented language models through few-shot prompting for open-domain question answering

It is suggested that it might be crucial to slow down the race towards the biggest model and instead shift attention towards more effective ways to use models, including but not limited to, better prompting or increasing inference-time compute.

Hindsight: Posterior-guided training of retrievers for improved open-ended generation

This work model the guide retriever after the posterior distribution Q of passages given the input and the target output and train it jointly with the standard retriever and the generator by maximizing the evidence lower bound (ELBo) in expectation over Q.

Few-shot Learning with Retrieval Augmented Language Models

Atlas is presented, a carefully designed and pre-trained retrieval augmented language model able to learn knowledge intensive tasks with very few training examples, and the impact of the content of the document index is studied, showing that it can easily be updated.

Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval

Baleen, a system that improves the accuracy of multi-hop retrieval while learning robustly from weak training signals in the many-hop setting, is evaluated on retrieval for two-hop question answering and many-Hop claim verification, establishing state-of-the-art performance.

Relevance-guided Supervision for OpenQA with ColBERT

This work proposes a weak supervision strategy that iteratively uses ColBERT to create its own training data, which greatly improves OpenQA retrieval on both Natural Questions and TriviaQA, and the resulting end-to-end Open QA system attains state-of-the-art performance on both of those datasets.

Language Models are Unsupervised Multitask Learners

It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT

ColBERT is presented, a novel ranking model that adapts deep LMs (in particular, BERT) for efficient retrieval that is competitive with existing BERT-based models (and outperforms every non-BERT baseline) and enables leveraging vector-similarity indexes for end-to-end retrieval directly from millions of documents.

Large Language Models Can Self-Improve

This work uses a pre-trained LLM to generate “high-confidence” rationale-augmented answers for unlabeled questions using Chain-of-Thought prompting and self-consistency, and conducts ablation studies and shows that ablation on reasoning is critical for self-improvement.