Language Model Cascades

  title={Language Model Cascades},
  author={David Dohan and Winnie Xu and Aitor Lewkowycz and Jacob Austin and David Bieber and Raphael Gontijo Lopes and Yuhuai Wu and Henryk Michalewski and Rif A. Saurous and Jascha Narain Sohl-Dickstein and Kevin Murphy and Charles Sutton},
Prompted models have demonstrated impressive few-shot learning abilities. Repeated interactions at test-time with a single model, or the composition of multiple models together, further expands capabilities. These compositions are probabilistic models, and may be expressed in the language of graphical models with random variables whose values are complex data types such as strings. Cases with control flow and dynamic structure require techniques from probabilistic programming, which allow… 

Figures from this paper

ThinkSum: Probabilistic reasoning over sets using large language models

It is argued that because the probabilistic inference in T HINK S UM is performed outside of calls to the LLM, it is less sensitive to prompt design, yields more interpretable predictions, and can be flexibly combined with latent variable models to extract structured knowledge from LLMs.

Prompting Is Programming: A Query Language For Large Language Models

LMQL is implemented, which leverages the constraints and control flow from an LMP prompt to generate an efficient inference procedure that minimizes the number of expensive calls to the underlying language model.

Psychologically-informed chain-of-thought prompts for metaphor understanding in large language models

Chain-of-thought prompts are used to intro-duce structures from probabilistic models into large language models and show that they can improve paraphrase selection.

Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought

To enable systematic exploration of the reasoning ability of LLMs, a new synthetic question-answering dataset is presented, where each example is generated from a synthetic world model represented in first-order logic, which allows us to parse the generated chain-of-thought into symbolic proofs for formal analysis.

Iterated Decomposition: Improving Science Q&A by Supervising Reasoning Processes

Language models (LMs) can perform complex reasoning either end-to-end, with hidden latent state, or compositionally, with transparent intermediate state. Composition offers benefits for

Parsel: A Unified Natural Language Framework for Algorithmic Reasoning

This work introduces Parsel 2, a framework enabling automatic implementation and validation of complex algorithms with code LLMs, based on hierarchical function descriptions in natural language, which can be used across domains requiring hierarchical reasoning, e.g. code synthesis, theorem proving, and robotic planning.

Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP

This work proposes D EMONSTRATE – S EARCH –P REDICT (DSP), a framework that relies on passing natural language texts in sophisticated pipelines between an LM and an RM, and can express high-level programs that bootstrap pipeline-aware demonstrations, search for relevant passages, and generate grounded predictions.

Complex Reading Comprehension Through Question Decomposition

A novel learning approach is proposed that helps language models better understand multi-hop questions and perform “com-plex, compositional” reasoning.

Memory Augmented Large Language Models are Computationally Universal

It is established that an existing large language model, Flan-U-PaLM 540B, can be combined with an associative read-write memory to exactly simulate the execution of a universal Turing machine, U 15 , 2.

Towards Reasoning in Large Language Models: A Survey

A comprehensive overview of the current state of knowledge on reasoning in large language models, including techniques for improving and eliciting reasoning in these models, methods and benchmarks for evaluating reasoning abilities, and suggestions on future directions are provided.



Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning

A Selection-Inference (SI) framework is proposed that exploits pre-trained LLMs as general processing modules, and alternates between selection and inference to generate a series of interpretable, casual reasoning steps leading to the final answer.

Training Language Models with Language Feedback

This work proposes to learn from natural language feedback, which conveys more information per human evaluation, from a GPT-3 model to roughly human-level summarization ability using a three-step learning algorithm.

Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language

This work shows that model diversity is symbiotic, and can be leveraged to build AI systems with structured Socratic dialogue – in which new multimodal tasks are formulated as a guided language- based exchange between different pre-existing foundation models, without additional language-based exchange.

Foundation Posteriors for Approximate Probabilistic Inference

By optimizing a single neural network across a range of programs the authors amortize the cost of training, yielding a “foundation” posterior able to do zero-shot inference for new programs, and the approach is shown on a benchmark of STAN programs.

Language Models are Few-Shot Learners

GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.

Chain of Thought Prompting Elicits Reasoning in Large Language Models

Experiments on three large language models show that chain-of-thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks.

Generative Language Modeling for Automated Theorem Proving

This work presents an automated prover and proof assistant, GPT-f, for the Metamath formalization language, and analyzes its performance, finding new short proofs that were accepted into the mainMetamath library, which is to this knowledge, the first time a deep-learning based system has contributed proofs that are adopted by a formal mathematics community.

Show Your Work: Scratchpads for Intermediate Computation with Language Models

Surprisingly, large pre-trained language models are able to perform complex multistep computations—even in the few-shot regime—when asked to perform the operation “step by step”, showing the results of intermediate computations.

Self-Consistency Improves Chain of Thought Reasoning in Language Models

A simple ensemble strategy, self-consistency, that robustly improves accuracy across a variety of language models and model scales without the need for additional training or auxiliary models is explored.

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Evaluation of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters finds that model performance and calibration both improve with scale, but are poor in absolute terms.