Counterfactual State Explanations for Reinforcement Learning Agents via Generative Deep Learning

  title={Counterfactual State Explanations for Reinforcement Learning Agents via Generative Deep Learning},
  author={Matthew Lyle Olson and Roli Khanna and Lawrence Neal and Fuxin Li and Weng-Keen Wong},
Counterfactual explanations, which deal with “why not?” scenarios, can provide insightful explanations to an AI agent’s behavior [Miller, 2019]. In this work, we focus on generating counterfactual explanations for deep reinforcement learning (RL) agents which operate in visual input environments like Atari. We introduce counterfactual state explanations, a novel example-based approach to counterfactual explanations based on generative deep learning. Specifically, a counterfactual state… Expand
The Intriguing Relation Between Counterfactual Explanations and Adversarial Examples
It is argued that the relationship to the true label and the tolerance with respect to proximity are two properties that formally distinguish CEs and AEs, and are introduced mathematically in a common framework. Expand
"That's (not) the output I expected!" On the role of end user expectations in creating explanations of AI systems
It is found that factual explanations are indeed appropriate when expectations and output match, and neither factual nor counterfactual explanations appear appropriate, which suggests that explanation-generating systems may need to identify such end user expectations. Expand
If Only We Had Better Counterfactual Explanations: Five Key Deficits to Rectify in the Evaluation of Counterfactual XAI Techniques
Five key deficits in the evaluation of these methods are detailed and a roadmap, with standardised benchmark evaluations, is proposed to resolve the issues arising; issues that currently effectively block scientific progress in this field. Expand
Understanding Finite-State Representations of Recurrent Policy Networks
This work introduces an approach for understanding finite-state machine (FSM) representations of recurrent policy networks and contributes a saliency tool to attain a deeper understanding of the role of observations in the decisions. Expand


Counterfactual States for Atari Agents via Generative Deep Learning
The user study results suggest that the generated counterfactual states are useful in helping non-expert participants gain a better understanding of an agent's decision making process. Expand
Explaining machine learning classifiers through diverse counterfactual explanations
This work proposes a framework for generating and evaluating a diverse set of counterfactual explanations based on determinantal point processes, and provides metrics that enable comparison ofcounterfactual-based methods to other local explanation methods. Expand
Explainable Reinforcement Learning Through a Causal Lens
An approach is presented that learns a structural causal model during reinforcement learning and encodes causal relationships between variables of interest and this model is then used to generate explanations of behaviour based on counterfactual analysis of the causal model. Expand
Exploratory Not Explanatory: Counterfactual Analysis of Saliency Maps for Deep Reinforcement Learning
This work uses Atari games, a common benchmark for deep RL, to evaluate three types of saliency maps and introduces an empirical approach grounded in counterfactual reasoning to test the hypotheses generated from saliencyMaps and assess the degree to which they correspond to the semantics of RL environments. Expand
Mental Models of Mere Mortals with Explanations of Reinforcement Learning
The results show that a combined explanation that included saliency and reward bars was needed to achieve a statistically significant difference in participants’ mental model scores over the no-explanation treatment, however, this combined explanation was far from a panacea: It exacted disproportionately high cognitive loads from the participants who received the combined explanation. Expand
Contrastive Explanations for Reinforcement Learning in terms of Expected Consequences
This study proposes a method that enables a RL agent to explain its behavior in terms of the expected consequences of state transitions and outcomes, and developed a procedure that enables the agent to obtain the consequences of a single action, as well as its entire policy. Expand
Visualizing and Understanding Atari Agents
A method for generating useful saliency maps is introduced and used to show 1) what strong agents attend to, 2) whether agents are making decisions for the right or wrong reasons, and 3) how agents evolve during learning. Expand
Towards Interpretable Reinforcement Learning Using Attention Augmented Agents
This model uses a soft, top-down attention mechanism to create a bottleneck in the agent, forcing it to focus on task-relevant information by sequentially querying its view of the environment. Expand
Programmatically Interpretable Reinforcement Learning
This work proposes a new method, called Neurally Directed Program Search (NDPS), for solving the challenging nonsmooth optimization problem of finding a programmatic policy with maximal reward, and demonstrates that NDPS is able to discover human-readable policies that pass some significant performance bars. Expand
Exploring Computational User Models for Agent Policy Summarization
This paper introduces an imitation learning-based approach to policy summarization, and demonstrates through a human-subject study that people use different models to reconstruct policies in different contexts, and that matching the summary extraction model to these can improve performance. Expand