• Corpus ID: 235352514

Be Considerate: Objectives, Side Effects, and Deciding How to Act

  title={Be Considerate: Objectives, Side Effects, and Deciding How to Act},
  author={Parand Alizadeh Alamdari and Toryn Q. Klassen and Rodrigo Toro Icarte and Sheila A. McIlraith},
Recent work in AI safety has highlighted that in sequential decision making, objectives are often underspecified or incomplete. This gives discretion to the acting agent to realize the stated objective in ways that may result in undesirable outcomes. We contend that to learn to act safely, a reinforcement learning (RL) agent should include contemplation of the impact of its actions on the wellbeing and agency of others in the environment, including other acting agents and reactive processes. We… 

Figures and Tables from this paper

Sympathy-based Reinforcement Learning Agents
This work explores the ability for an agent trained through reinforcement learning to exhibit sympathetic behaviours towards another (independent) agent in the environment by first inferring the reward function of the independent agent, through inverse reinforcement learning, and subsequently learning a policy based on a sympathetic reward function.
Empathetic Reinforcement Learning Agents
With the increased interaction between artificial agents and humans, the need to have agents who can respond to their human counterparts appropriately will be crucial for the deployment of


Avoiding Side Effects By Considering Future Tasks
This work formally defines interference incentives and shows that the future task approach with a baseline policy avoids these incentives in the deterministic case and is more effective for avoiding side effects than the common approach of penalizing irreversible actions.
SafeLife 1.0: Exploring Side Effects in Complex Environments
We present SafeLife, a publicly available reinforcement learning environment that tests the safety of reinforcement learning agents. It contains complex, dynamic, tunable, procedurally generated
Penalizing Side Effects using Stepwise Relative Reachability
A new variant of the stepwise inaction baseline and a new deviation measure based on relative reachability of states are introduced that avoids the given undesirable incentives, while simpler baselines and the unreachability measure fail.
Minimax-Regret Querying on Side Effects for Safe Optimality in Factored Markov Decision Processes
A planning algorithm is developed that avoids potentially negative side effects given what the agent knows about (un)changeable features and a provably minimax-regret querying strategy is formulated for the agent to selectively ask the user about features that it hasn't explicitly been told about.
Avoiding Side Effects in Complex Environments
Attainable Utility Preservation (AUP) avoids side effects by penalizing shifts in the ability to achieve randomly generated goals in toy environments by preserving optimal value for a single randomly generated reward function.
Towards Empathic Deep Q-Learning
This paper introduces an extension to Deep Q-Networks, called Empathic DQN, that is loosely inspired both by empathy and the golden rule, to help mitigate negative side effects to other agents resulting from myopic goal-directed behavior.
Conservative Agency via Attainable Utility Preservation
This work introduces an approach that balances optimization of the primary reward function with preservation of the ability to optimize auxiliary reward functions, and surprisingly, even when the auxiliary rewards are randomly generated and therefore uninformative about the correctly specified reward function, this approach induces conservative, effective behavior.
AvE: Assistance via Empowerment
This work proposes a new paradigm for assistance by increasing the human's ability to control their environment, and formalizes this approach by augmenting reinforcement learning with human empowerment, and proposes an efficient empowerment-inspired proxy metric.
Symbolic Plans as High-Level Instructions for Reinforcement Learning
An empirical evaluation shows that the use of techniques from knowledge representation and reasoning as a framework for defining final-state goal tasks and automatically producing their corresponding reward functions converges to near-optimal solutions faster than standard RL and HRL methods.