• Corpus ID: 72940743

Penalizing Side Effects using Stepwise Relative Reachability

@article{Krakovna2019PenalizingSE,
  title={Penalizing Side Effects using Stepwise Relative Reachability},
  author={Victoria Krakovna and Laurent Orseau and Miljan Martic and Shane Legg},
  journal={arXiv: Learning},
  year={2019}
}
How can we design safe reinforcement learning agents that avoid unnecessary disruptions to their environment. [] Key Method We introduce a new variant of the stepwise inaction baseline and a new deviation measure based on relative reachability of states. The combination of these design choices avoids the given undesirable incentives, while simpler baselines and the unreachability measure fail. We demonstrate this empirically by comparing different combinations of baseline and deviation measure choices on a…

Figures from this paper

Avoiding Negative Side-Effects and Promoting Safe Exploration with Imaginative Planning

TLDR
This paper proposes a model-based approach to safety that allows the agent to look into the future and be aware of the future consequences of its actions, and generates a directed graph called the imaginative module that encapsulates all possible trajectories that can be followed by the agent.

Avoiding Side Effects By Considering Future Tasks

TLDR
This work formally defines interference incentives and shows that the future task approach with a baseline policy avoids these incentives in the deterministic case and is more effective for avoiding side effects than the common approach of penalizing irreversible actions.

A Multi-Objective Approach to Mitigate Negative Side Effects

TLDR
Empirical evaluation shows that the proposed framework can successfully mitigate NSE and that different feedback mechanisms introduce different biases, which influence the identification of NSE.

Formalizing the Problem of Side Effect Regularization

TLDR
This work proposes a formal criterion for side effect regularization via the assistance game framework and shows that this POMDP is solved by trading off the proxy reward with the agent’s ability to achieve a range of future tasks.

Be Considerate: Objectives, Side Effects, and Deciding How to Act

TLDR
This work contends that to learn to act safely, a reinforcement learning (RL) agent should include contemplation of the impact of its actions on the wellbeing and agency of others in the environment, including other acting agents and reactive processes, as well as providing different criteria for characterizing impact.

Be Considerate: Avoiding Negative Side Effects in Reinforcement Learning

In sequential decision making – whether it’s realized with or without the benefit of a model – objectives are often underspecified or incomplete. This gives discretion to the acting agent to realize

A Causal Influence Diagram Perspective

TLDR
This paper uses an intuitive yet precise graphical model called causal influence diagrams to formalize reward tampering problems, and describes a number of modifications to the reinforcement learning objective that prevent incentives for reward tampering.

Mitigating the Negative Side Effects of Reasoning with Imperfect Models: A Multi-Objective Approach

TLDR
The problem of mitigating the impact of NSE is formulated as a multi-objective Markov decision process with lexicographic reward preferences and slack and empirical evaluation shows that the proposed framework can successfully mitigate NSE.

SafeLife 1.0: Exploring Side Effects in Complex Environments

We present SafeLife, a publicly available reinforcement learning environment that tests the safety of reinforcement learning agents. It contains complex, dynamic, tunable, procedurally generated

Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective

TLDR
This paper uses an intuitive yet precise graphical model called causal influence diagrams to formalize reward tampering problems, and describes a number of modifications to the reinforcement learning objective that prevent incentives for reward tampering.

References

SHOWING 1-10 OF 39 REFERENCES

AI Safety Gridworlds

We present a suite of reinforcement learning environments illustrating various safety properties of intelligent agents. These problems include safe interruptibility, avoiding side effects, absent

Minimax-Regret Querying on Side Effects for Safe Optimality in Factored Markov Decision Processes

TLDR
A planning algorithm is developed that avoids potentially negative side effects given what the agent knows about (un)changeable features and a provably minimax-regret querying strategy is formulated for the agent to selectively ask the user about features that it hasn't explicitly been told about.

Conservative Agency via Attainable Utility Preservation

TLDR
This work introduces an approach that balances optimization of the primary reward function with preservation of the ability to optimize auxiliary reward functions, and surprisingly, even when the auxiliary rewards are randomly generated and therefore uninformative about the correctly specified reward function, this approach induces conservative, effective behavior.

Quantilizers: A Safer Alternative to Maximizers for Limited Optimization

TLDR
This paper describes an alternative to expected utility maximization for powerful AI systems, which is called expected utility quantilization, which could allow the construction of AI systems that do not necessarily fall into strange and unanticipated shortcuts and edge cases in pursuit of their goals.

Safe Exploration in Continuous Action Spaces

TLDR
This work addresses the problem of deploying a reinforcement learning agent on a physical system such as a datacenter cooling unit or robot, where critical constraints must never be violated, and directly adds to the policy a safety layer that analytically solves an action correction formulation per each state.

Inverse Reward Design

TLDR
This work introduces inverse reward design (IRD) as the problem of inferring the true objective based on the designed reward and the training MDP, and introduces approximate methods for solving IRD problems, and uses their solution to plan risk-averse behavior in test MDPs.

Safely Interruptible Agents

TLDR
This paper explores a way to make sure a learning agent will not learn to prevent being interrupted by the environment or a human operator, and provides a formal definition of safe interruptibility and exploit the off-policy learning property to prove that either some agents are already safely interruptible, like Q-learning, or can be made so, like Sarsa.

Risk-Sensitive Reinforcement Learning

TLDR
A risk-sensitive Q-learning algorithm is derived, which is necessary for modeling human behavior when transition probabilities are unknown, and applied to quantify human behavior in a sequential investment task and is found to provide a significantly better fit to the behavioral data and leads to an interpretation of the subject's responses that is indeed consistent with prospect theory.

Safe Exploration in Finite Markov Decision Processes with Gaussian Processes

TLDR
A novel algorithm is developed and proved that it is able to completely explore the safely reachable part of the MDP without violating the safety constraint, and is demonstrated on digital terrain models for the task of exploring an unknown map with a rover.

Risk-Sensitive Reinforcement Learning

TLDR
This risk-sensitive reinforcement learning algorithm is based on a very different philosophy and reflects important properties of the classical exponential utility framework, but avoids its serious drawbacks for learning.