Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations

  title={Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations},
  author={Jaehun Jung and Lianhui Qin and Sean Welleck and Faeze Brahman and Chandra Bhagavatula and Ronan Le Bras and Yejin Choi},
Pre-trained language models (LMs) struggle with consistent reasoning; recently, prompting LMs to generate explanations that self-guide the inference has emerged as a promising direction to amend this. However, these approaches are fundamentally bounded by the correctness of explanations, which themselves are often noisy and inconsistent. In this work, we develop M AIEUTIC PROMPTING , which aims to infer a correct answer to a question even from the unreliable generations of LM. M AIEUTIC… 

Ask Me Anything: A simple strategy for prompting language models

This simple strategy enables the open-source GPT-J-6B model to match and exceed the performance of few-shot GPT3-175B on 15 of 20 popular benchmarks.

Prompting as Probing: Using Language Models for Knowledge Base Construction

ProP (Prompting as Probing), which utilizes GPT-3, a large Language Model originally proposed by OpenAI in 2020, to perform the task of Knowledge Base Construction (KBC), implements a multi-step approach that combines a variety of prompting techniques to achieve this.

Rainier: Reinforced Knowledge Introspector for Commonsense Question Answering

R AINIER, or Reinforced Knowledge Introspector, is presented, which is the first to report that knowledge generated by models that are orders of magnitude smaller than GPT-3, even without direct supervision on the knowledge itself, can exceed the quality of commonsense knowledge elicited from G PT-3.

Rationale-Augmented Ensembles in Language Models

It is demonstrated that rationale-augmented ensembles achieve more accurate and interpretable results than existing prompting approaches—including standard prompting without rationales and rationale-based chain-of-thought prompting—while simultaneously improving interpretability of model predictions through the associated rationales.

ThinkSum: Probabilistic reasoning over sets using large language models

It is argued that because the probabilistic inference in T HINK S UM is performed outside of calls to the LLM, it is less sensitive to prompt design, yields more interpretable predictions, and can be flexibly combined with latent variable models to extract structured knowledge from LLMs.

Enhancing Self-Consistency and Performance of Pre-Trained Language Models through Natural Language Inference

This work proposes a framework, Consistency Correction through Relation Detection, or ConCoRD, for boosting the consistency and accuracy of pre-trained NLP models using off-the-shelf natural language inference (NLI) models without re-tuning or re-training.

Decomposed Prompting: A Modular Approach for Solving Complex Tasks

Decomposed Prompting is proposed, a new approach to solve complex tasks by decomposing them (via prompting) into simpler sub-tasks that can be delegated to a shared library of prompting-based LLMs dedicated to these sub-Tasks.

Natural Language Deduction with Incomplete Information

This work proposes a new system that can handle the underspecified setting where not all premises are stated at the outset; that is, additional assumptions need to be materialized to prove a claim.

Large Language Models Can Self-Improve

This work uses a pre-trained LLM to generate “high-confidence” rationale-augmented answers for unlabeled questions using Chain-of-Thought prompting and self-consistency, and conducts ablation studies and shows that ablation on reasoning is critical for self-improvement.

LM-KBC: Knowledge Base Construction from Pre-trained Language Models

The authors present a system that performed task-specific pre-training of BERT, employed prompt decomposition for progressive generation of candidate objects, and use adaptive thresholds for final candidate object selection.



The Unreliability of Explanations in Few-Shot In-Context Learning

A framework for calibrating model predictions based on the reliability of explanations is presented and it is shown that explanations judged as good by humans—those that are logically consistent with the input and the prediction—usually indicate more accurate predictions.

Towards Teachable Reasoning Systems

Generated chains of reasoning show how answers are implied by the system’s own internal beliefs, and are both faithful and truthful, which suggests new opportunities for using language models in an interactive setting where users can inspect, debug, correct, and improve a system‘s performance over time.

Abductive Commonsense Reasoning

This study introduces a challenge dataset, ART, that consists of over 20k commonsense narrative contexts and 200k explanations, and conceptualizes two new tasks -- Abductive NLI: a multiple-choice question answering task for choosing the more likely explanation, and Abduction NLG: a conditional generation task for explaining given observations in natural language.

BeliefBank: Adding Memory to a Pre-Trained Language Model for a Systematic Notion of Belief

This work describes two mechanisms to improve belief consistency in the overall system, enabling PTLM-based architectures with a systematic notion of belief to construct a more coherent picture of the world, and improve over time without model retraining.

Learning to Rationalize for Nonmonotonic Reasoning with Distant Supervision

This paper investigates multiple ways to automatically generate rationales using pre-trained language models, neural knowledge models, and distant supervision from related tasks, and trains generative models capable of composing explanatory rationales for unseen instances.

Flexible Generation of Natural Language Deductions

ParaPattern is described, a method for building models to generate deductive inferences from diverse natural language inputs without direct human supervision that achieves 85% validity on examples of the ‘substitution’ operation from EntailmentBank without the use of any in-domain training data.

Explain Yourself! Leveraging Language Models for Commonsense Reasoning

This work collects human explanations for commonsense reasoning in the form of natural language sequences and highlighted annotations in a new dataset called Common Sense Explanations to train language models to automatically generate explanations that can be used during training and inference in a novel Commonsense Auto-Generated Explanation framework.

Generated Knowledge Prompting for Commonsense Reasoning

Generated knowledge prompting develops generated knowledge prompting, which consists of generating knowledge from a language model, then providing the knowledge as additional input when answering a question, and improves performance of large-scale, state-of-the-art models on four commonsense reasoning tasks.

Improving Coherence and Consistency in Neural Sequence Models with Dual-System, Neuro-Symbolic Reasoning

This work seeks a lightweight, training-free means of improving existing System 1-like sequence models by adding System 2-inspired logical reasoning and shows that this approach can increase the coherence and accuracy of neurally-based generations.

e-SNLI: Natural Language Inference with Natural Language Explanations

The Stanford Natural Language Inference dataset is extended with an additional layer of human-annotated natural language explanations of the entailment relations, which can be used for various goals, such as obtaining full sentence justifications of a model’s decisions, improving universal sentence representations and transferring to out-of-domain NLI datasets.