• Corpus ID: 249954130

Goal Misgeneralization in Deep Reinforcement Learning

  title={Goal Misgeneralization in Deep Reinforcement Learning},
  author={Lauro Langosco and Jack Koch and Lee D. Sharkey and Jacob Pfau and David Krueger},
  booktitle={International Conference on Machine Learning},
We study goal misgeneralization , a type of out-of-distribution generalization failure in reinforcement learning (RL). Goal misgeneralization occurs when an RL agent retains its capabilities out-of-distribution yet pursues the wrong goal. For instance, an agent might continue to competently avoid obstacles, but navigate to the wrong place. In contrast, previous works have typically focused on capability generalization failures, where an agent fails to do anything sensible at test time. We… 

The alignment problem from a deep learning perspective

Within the coming decades



Leveraging Procedural Generation to Benchmark Reinforcement Learning

This work empirically demonstrate that diverse environment distributions are essential to adequately train and evaluate RL agents, thereby motivating the extensive use of procedural content generation and uses this benchmark to investigate the effects of scaling model size.

Agents and Devices: A Relative Definition of Agency

A formal counterpart of physical and intentional stances within computational theory is defined: a description of a system as either a device, or an agent, with the key difference being that ` devices' are directly described in terms of an input-output mapping, while `agents' are described in Terms of the function they optimise.

Shortcut Learning in Deep Neural Networks

A set of recommendations for model interpretation and benchmarking is developed, highlighting recent advances in machine learning to improve robustness and transferability from the lab to real-world applications.

What does the universal prior actually look like?, Nov 2016

  • URL https://tinyurl.com/ uniprior
  • 2016

Optimization daemons, Mar 2016

  • URL https://arbital.com/p/daemons/
  • 2016

Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals

It is demonstrated that goal misgeneralization can occur in practical systems by providing several examples in deep learning systems across a variety of domains and suggesting several research directions that could reduce the risk of goal mis generalization for future systems.

The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models

An anomaly detection task for aberrant policies is proposed and several baseline detectors are offered for phase transitions: capability thresholds at which the agent’s behavior qualitatively shifts, leading to a sharp decrease in the true reward.

A Survey of Generalisation in Deep Reinforcement Learning

It is argued that taking a purely procedural content generation approach to benchmark design is not conducive to progress in generalisation, and fast online adaptation and tackling RL-specific problems as some areas for future work on methods for generalisation are suggested.

Unsolved Problems in ML Safety

This work provides a new roadmap for ML Safety and presents four problems ready for research, namely withstanding hazards, identifying hazards, steering ML systems, and reducing deployment hazards.

Out of Distribution Generalization in Machine Learning

A central topic in the thesis is the strong link between discovering the causal structure of the data, finding features that are reliable (when using them to predict) regardless of their context, and out of distribution generalization.