A Multi-Objective Approach to Mitigate Negative Side Effects

@inproceedings{Saisubramanian2020AMA,
  title={A Multi-Objective Approach to Mitigate Negative Side Effects},
  author={Sandhya Saisubramanian and Ece Kamar and Shlomo Zilberstein},
  booktitle={IJCAI},
  year={2020}
}
Agents operating in unstructured environments often create negative side effects (NSE) that may not be easy to identify at design time. We examine how various forms of human feedback or autonomous exploration can be used to learn a penalty function associated with NSE during system deployment. We formulate the problem of mitigating the impact of NSE as a multi-objective Markov decision process with lexicographic reward preferences and slack. The slack denotes the maximum deviation from an… 

Figures and Tables from this paper

Mitigating Negative Side Effects via Environment Shaping
TLDR
An algorithm is presented to solve the problem of how humans can assist an agent, beyond providing feedback, and exploit their broader scope of knowledge to mitigate the impacts of NSE.
Be Considerate: Objectives, Side Effects, and Deciding How to Act
TLDR
This work contends that to learn to act safely, a reinforcement learning (RL) agent should include contemplation of the impact of its actions on the wellbeing and agency of others in the environment, including other acting agents and reactive processes, as well as providing different criteria for characterizing impact.
Avoiding Negative Side Effects of Autonomous Systems in the Open World
TLDR
Two complementary approaches to mitigate the NSE via learning from feedback and environment shaping are presented, and how a human can assist an agent, beyond providing feedback, and utilize their broader scope of knowledge to mitigating the impacts of NSE is examined.
Challenges for Using Impact Regularizers to Avoid Negative Side Effects
TLDR
The main current challenges of impact regularizers are examined and they are related to fundamental design decisions and promising directions to overcome the unsolved challenges in preventing negative side effects with impact regularizer are explored.
A practical guide to multi-objective reinforcement learning and planning
TLDR
This paper identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems.
Understanding User Attitudes Towards Negative Side Effects of AI Systems
TLDR
The results indicate that users are willing to tolerate side effects that are not safety-critical but prefer to minimize them as much as possible, which support key fundamental assumptions in existing techniques and facilitate the development of new methods to overcome negative side effects of AI systems.
Learning to Generate Fair Clusters from Demonstrations
TLDR
An algorithm to identify the fairness metric from demonstrations and generate clusters using existing off-the-shelf clustering techniques, and a greedy method for clustering for novel fairness metrics for which clustering algorithms do not currently exist are presented.
Models of Intervention: Helping Agents and Human Users Avoid Undesirable Outcomes
TLDR
Using a revised feature set more appropriate to human behavior, this work produces a learned model to recognize when a human user is about to trigger an undesirable outcome and finds that the revised model also dominates existing Plan Recognition algorithms in predicting Human-Aware Intervention.
Using Metareasoning to Maintain and Restore Safety for Reliably Autonomy
TLDR
A safety metareasoning system that mitigates the severity of the system's safety concerns while reducing the interference to the system’s task, an application of the approach to planetary rover exploration, and a demonstration that the approach is effective in simulation are offered.
Building Efficient, Reliable, and Ethical Autonomous Systems
TLDR
The goal of the research is to build autonomous systems that operate in natural, partially observable, stochastic domains for long durations in not only an efficient and reliable but also ethical way.
...
1
2
...

References

SHOWING 1-10 OF 23 REFERENCES
Mitigating the Negative Side Effects of Reasoning with Imperfect Models: A Multi-Objective Approach
TLDR
The problem of mitigating the impact of NSE is formulated as a multi-objective Markov decision process with lexicographic reward preferences and slack and empirical evaluation shows that the proposed framework can successfully mitigate NSE.
Penalizing Side Effects using Stepwise Relative Reachability
TLDR
A new variant of the stepwise inaction baseline and a new deviation measure based on relative reachability of states are introduced that avoids the given undesirable incentives, while simpler baselines and the unreachability measure fail.
Multi-Objective MDPs with Conditional Lexicographic Reward Preferences
TLDR
A rich model called Lexicographic MDP (LMDP) and a corresponding planning algorithm called LVI that generalize previous work by allowing for conditional lexicographic preferences with slack are introduced and the convergence characteristics of LVI are analyzed.
Minimax-Regret Querying on Side Effects for Safe Optimality in Factored Markov Decision Processes
TLDR
A planning algorithm is developed that avoids potentially negative side effects given what the agent knows about (un)changeable features and a provably minimax-regret querying strategy is formulated for the agent to selectively ask the user about features that it hasn't explicitly been told about.
A Survey of Multi-Objective Sequential Decision-Making
TLDR
This article surveys algorithms designed for sequential decision-making problems with multiple objectives and proposes a taxonomy that classifies multi-objective methods according to the applicable scenario, the nature of the scalarization function, and the type of policies considered.
Inverse Reward Design
TLDR
This work introduces inverse reward design (IRD) as the problem of inferring the true objective based on the designed reward and the training MDP, and introduces approximate methods for solving IRD problems, and uses their solution to plan risk-averse behavior in test MDPs.
Overcoming Blind Spots in the Real World: Leveraging Complementary Abilities for Joint Execution
TLDR
This work studies how learning about blind spots of both humans and agents can be used to manage hand-off decisions when humans and agent jointly act in the real-world in which neither of them are trained or evaluated fully.
Concrete Problems in AI Safety
TLDR
A list of five practical research problems related to accident risk, categorized according to whether the problem originates from having the wrong objective function, an objective function that is too expensive to evaluate frequently, or undesirable behavior during the learning process, are presented.
Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs
TLDR
Memory-Bounded Dynamic Programming is generalized and its scalability is improved by reducing the complexity with respect to the number of observations from exponential to polynomial, and error bounds on solution quality are derived.
Planning in Stochastic Environments with Goal Uncertainty
TLDR
An admissible heuristic is proposed that reduces the planning time using FLARES — a start-of-the-art probabilistic planner for solving the Goal Uncertain Stochastic Shortest Path problem.
...
1
2
3
...