• Corpus ID: 221665105

Learning to summarize from human feedback

@article{Stiennon2020LearningTS,
  title={Learning to summarize from human feedback},
  author={Nisan Stiennon and Long Ouyang and Jeff Wu and Daniel M. Ziegler and Ryan J. Lowe and Chelsea Voss and Alec Radford and Dario Amodei and Paul Christiano},
  journal={ArXiv},
  year={2020},
  volume={abs/2009.01325}
}
As language models become more powerful, training and evaluation are increasingly bottlenecked by the data and metrics used for a particular task. For example, summarization models are often trained to predict human reference summaries and evaluated using ROUGE, but both of these metrics are rough proxies for what we really care about---summary quality. In this work, we show that it is possible to significantly improve summary quality by training a model to optimize for human preferences. We… 
Training Language Models with Natural Language Feedback
TLDR
This work proposes to learn from natural language feedback, which conveys more information per human evaluation, from a GPT-3 model to roughly human-level summarization ability using a three-step learning algorithm.
Learning from Natural Language Feedback
TLDR
This work proposes to learn from natural language feedback, which conveys more information per human evaluation, using a three-step learning algorithm that fine-tunes a GPT-3 model to roughly human-level summarization ability.
Training Language Models with Language Feedback
TLDR
This work proposes to learn from natural language feedback, which conveys more information per human evaluation, from a GPT-3 model to roughly human-level summarization ability using a three-step learning algorithm.
Make The Most of Prior Data: A Solution for Interactive Text Summarization with Preference Feedback
TLDR
By properly leverag-ing ofine data and a novel reward model, this paper improves the performance regarding ROUGE scores and sample-efficiency and introduces a new framework to train summarization models with preference feedback interactively.
Active Learning with Label Comparisons
TLDR
A key element in this analysis is the “label neighborhood graph” of the true distribution, which has an edge between two classes if they share a decision boundary and can provide improved sample complexity in the worst case.
Recursively Summarizing Books with Human Feedback
TLDR
This method combines learning from human feedback with recursive task decomposition: it uses models trained on smaller parts of the task to assist humans in giving feedback on the broader task, and generates sensible summaries of entire books.
An Exploration of Post-Editing Effectiveness in Text Summarization
TLDR
This study sheds valuable insights on when post-editing is useful for text summarization: it helped in some cases but not in others, and participants’ different editing strategies and needs for assistance offer implications for future human-AI summarization systems.
Active Programming by Example with a Natural Language Prior
TLDR
APEL, a new framework that enables non-programmers to indirectly annotate natural language utterances with executable meaning representations, such as SQL programs, is introduced, to reduce effort required from annotators and synthesize simple input databases that nonetheless have high information gain.
Capturing Failures of Large Language Models via Human Cognitive Biases
TLDR
This work uses cognitive biases to identify inputs that models are likely to err on, and develops tests to qualitatively characterize their errors on these inputs, to uncover high-impact errors such as incorrectly deleting files.
The Dangers of Underclaiming: Reasons for Caution When Reporting How NLP Systems Fail
Researchers in NLP often frame and discuss research results in ways that serve to deemphasize the field’s successes, often in response to the field’s widespread hype. Though well-meaning, this has
...
...

References

SHOWING 1-10 OF 85 REFERENCES
Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics
TLDR
Two new objective automatic evaluation methods for machine translation based on longest common subsequence between a candidate translation and a set of reference translations and relaxes strict n-gram matching to skip-bigram matching are described.
Language Models are Few-Shot Learners
TLDR
GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.
Fine-Tuning Language Models from Human Preferences
TLDR
This paper builds on advances in generative pretraining of language models to apply reward learning to four natural language tasks: continuing text with positive sentiment or physically descriptive language, and summarization tasks on the TL;DR and CNN/Daily Mail datasets.
Improving Language Understanding by Generative Pre-Training
TLDR
The general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, improving upon the state of the art in 9 out of the 12 tasks studied.
Better Rewards Yield Better Summaries: Learning to Summarise Without References
TLDR
This work learns a reward function from human ratings on 2,500 summaries that can be used to train RL based summarisation systems without using any reference summaries, and shows that the learned rewards have significantly higher correlation with human ratings than previous approaches.
Scalable agent alignment via reward modeling: a research direction
TLDR
This work outlines a high-level research direction to solve the agent alignment problem centered around reward modeling: learning a reward function from interaction with the user and optimizing the learned reward function with reinforcement learning.
TL;DR: Mining Reddit to Learn Automatic Summarization
TLDR
This work proposes a new method for mining social media for author-provided summaries, taking advantage of the common practice of appending a “TL;DR” to long posts, and yields the Webis-TLDR-17 dataset.
A Deep Reinforced Model for Abstractive Summarization
TLDR
A neural network model with a novel intra-attention that attends over the input and continuously generated output separately, and a new training method that combines standard supervised word prediction and reinforcement learning (RL) that produces higher quality summaries.
Language Models are Unsupervised Multitask Learners
TLDR
It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.
Proximal Policy Optimization Algorithms
We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective
...
...