• Corpus ID: 235358196

Counterfactual Explanations Can Be Manipulated

  title={Counterfactual Explanations Can Be Manipulated},
  author={Dylan Slack and Sophie Hilgard and Himabindu Lakkaraju and Sameer Singh},
Counterfactual explanations are emerging as an attractive option for providing recourse to individuals adversely impacted by algorithmic decisions. As they are deployed in critical applications (e.g. law enforcement, financial lending), it becomes important to ensure that we clearly understand the vulnerabilties of these methods and find ways to address them. However, there is little understanding of the vulnerabilities and shortcomings of counterfactual explanations. In this work, we introduce… 

Figures and Tables from this paper

A Survey on the Robustness of Feature Importance and Counterfactual Explanations
A survey of the works that analysed the robustness of two classes of local explanations that are popularly used in analysing AI/ML models in finance to unify existing definitions of robustness and introduces a taxonomy to classify different robustness approaches.
On the Adversarial Robustness of Causal Algorithmic Recourse
This work forms the adversarially robust recourse problem and shows that recourse methods offering minimally costly recourse fail to be robust, and proposes a model regularizer that encourages the additional cost of seeking robust recourse to be low.
Low-Cost Algorithmic Recourse for Users With Uncertain Cost Functions
This work formalizes the notion of user-specific cost functions and introduces a new method for identifying actionable recourses for users, called Expected Minimum Cost (EMC), which can approximately optimize for user satisfaction by first sampling plausible cost functions, then finding a set that achieves a good cost for the user in expectation.
On the Robustness of Counterfactual Explanations to Adverse Perturbations
Counterfactual explanations (CEs) are a powerful means for understanding how decisions made by algorithms can be changed. Researchers have proposed a number of desiderata that CEs should meet to be
Post-Hoc Explanations Fail to Achieve their Purpose in Adversarial Contexts
Existing and planned legislation stipulates various obligations to provide information about machine learning algorithms and their functioning, often interpreted as obligations to “explain”. Many


FACE: Feasible and Actionable Counterfactual Explanations
A new line of Counterfactual Explanations research is proposed aimed at providing actionable and feasible paths to transform a selected instance into one that meets a certain goal, based on the shortest path distances defined via density-weighted metrics.
On Counterfactual Explanations under Predictive Multiplicity
This work derives a general upper bound for the costs of counterfactual explanations under predictive multiplicity, which depends on a discrepancy notion between two classifiers, which describes how differently they treat negatively predicted individuals.
The hidden assumptions behind counterfactual explanations and principal reasons
It is demonstrated that the utility of feature-highlighting explanations relies on a number of easily overlooked assumptions, including that the recommended change in feature values clearly maps to real-world actions, that features can be made commensurate by looking only at the distribution of the training data, and that features are only relevant to the decision at hand.
Explaining machine learning classifiers through diverse counterfactual explanations
This work proposes a framework for generating and evaluating a diverse set of counterfactual explanations based on determinantal point processes, and provides metrics that enable comparison ofcounterfactual-based methods to other local explanation methods.
Model-Agnostic Counterfactual Explanations for Consequential Decisions
This work builds on standard theory and tools from formal verification and proposes a novel algorithm that solves a sequence of satisfiability problems, where both the distance function (objective) and predictive model (constraints) are represented as logic formulae.
Learning Model-Agnostic Counterfactual Explanations for Tabular Data
A framework, called C-CHVAE, is developed, drawing ideas from the manifold learning literature, that generates faithful counterfactuals and is suggested to complement the catalog ofcounterfactual quality measures using a criterion to quantify the degree of difficulty for a certainCounterfactual suggestion.
The Use and Misuse of Counterfactuals in Ethical Machine Learning
It is argued that even though counterfactuals play an essential part in some causal inferences, their use for questions of algorithmic fairness and social explanations can create more problems than they resolve.
A Survey of Contrastive and Counterfactual Explanation Generation Methods for Explainable Artificial Intelligence
This work conducts a systematic literature review which provides readers with a thorough and reproducible analysis of the interdisciplinary research field under study and defines a taxonomy regarding both theoretical and practical approaches to contrastive and counterfactual explanation.
Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR
It is suggested data controllers should offer a particular type of explanation, unconditional counterfactual explanations, to support these three aims, which describe the smallest change to the world that can be made to obtain a desirable outcome, or to arrive at the closest possible world, without needing to explain the internal logic of the system.
The age of secrecy and unfairness in recidivism prediction
It is argued that transparency satisfies a different notion of procedural fairness by providing both the defendants and the public with the opportunity to scrutinize the methodology and calculations behind risk scores for recidivism.