Solving math word problems with process- and outcome-based feedback

  title={Solving math word problems with process- and outcome-based feedback},
  author={Jonathan Uesato and Nate Kushman and Ramana Kumar and Francis Song and Noah Siegel and L. Wang and Antonia Creswell and Geoffrey Irving and Irina Higgins},
Recent work has shown that asking language models to generate reasoning steps improves performance on many reasoning tasks. When moving beyond prompting, this raises the question of how we should supervise such models: outcome-based approaches which supervise the final result, or process-based approaches which supervise the reasoning process itself? Differences between these approaches might naturally be expected not just in final-answer errors but also in reasoning errors, which can be… 

Figures and Tables from this paper

Iterated Decomposition: Improving Science Q&A by Supervising Reasoning Processes

Language models (LMs) can perform complex reasoning either end-to-end, with hidden latent state, or compositionally, with transparent intermediate state. Composition offers benefits for

The Wisdom of Hindsight Makes Language Models Better Instruction Followers

HIR is proposed, a novel algorithm for aligning language models with instructions by converting feedback to instruction by relabeling the original one and training the model for better alignment in a supervised manner and it outperforms the baseline algorithms and is comparable to or even surpasses supervised finetuning.

Parsel: A Unified Natural Language Framework for Algorithmic Reasoning

This work introduces Parsel 2, a framework enabling automatic implementation and validation of complex algorithms with code LLMs, based on hierarchical function descriptions in natural language, which can be used across domains requiring hierarchical reasoning, e.g. code synthesis, theorem proving, and robotic planning.

GPT-4 Technical Report

  • OpenAI
  • Computer Science
  • 2023
GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs, is developed, a Transformer-based model pre-trained to predict the next token in a document which exhibits human-level performance on various professional and academic benchmarks.



Self-Consistency Improves Chain of Thought Reasoning in Language Models

This paper proposes a new decoding strategy, self-consistency, to replace the naive greedy decoding used in chain-of-thought prompting that first samples a diverse set of reasoning paths instead of only taking the greedy one, and then selects the most consistent answer by marginalizing out the sampled reasoning paths.

Teaching language models to support answers with verified quotes

This work uses reinforcement learning from human preferences to train “open-book” QA models that generate answers whilst also citing specific evidence for their claims, which aids in the appraisal of correctness.

On the Advance of Making Language Models Better Reasoners

This paper conducts extensive experiments using the latest language model code-davinci-002 and demonstrates that D I V E RS E can achieve new state-of-the-art performance on six out of eight reasoning benchmarks, out-performing the PaLM model with 540B parameters.

Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning

A Selection-Inference (SI) framework is proposed that exploits pre-trained LLMs as general processing modules, and alternates between selection and inference to generate a series of interpretable, casual reasoning steps leading to the final answer.

MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms

A large-scale dataset of math word problems and an interpretable neural math problem solver by learning to map problems to their operation programs and a new representation language to model operation programs corresponding to each math problem that aim to improve both the performance and the interpretability of the learned models.

STaR: Bootstrapping Reasoning With Reasoning

A technique to iteratively leverage a small number of rationale examples and a large dataset without rationales to bootstrap the ability to perform successively more complex reasoning, called STaR, which lets a model improve itself by learning from its own generated reasoning.

Training Verifiers to Solve Math Word Problems

It is demonstrated that verification significantly improves performance on GSM8K, and there is strong empirical evidence that verification scales more effectively with increased data than a finetuning baseline.

Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems

Experimental results show that indirect supervision of program learning via answer rationales is a promising strategy for inducing arithmetic programs.

Solving Quantitative Reasoning Problems with Language Models

Language models have achieved remarkable performance on a wide range of tasks that require natural language understanding. Nevertheless, state-of-the-art models have generally struggled with tasks

Training language models to follow instructions with human feedback

The results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent and showing improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets.