• Corpus ID: 224818450

Counterfactual Explanations for Machine Learning: A Review

@article{Verma2020CounterfactualEF,
  title={Counterfactual Explanations for Machine Learning: A Review},
  author={Sahil Verma and John P. Dickerson and Keegan E. Hines},
  journal={ArXiv},
  year={2020},
  volume={abs/2010.10596}
}
Machine learning plays a role in many deployed decision systems, often in ways that are difficult or impossible to understand by human stakeholders. Explaining, in a human-understandable way, the relationship between the input and output of machine learning models is essential to the development of trustworthy machine-learning-based systems. A burgeoning body of research seeks to define the goals and methods of explainability in machine learning. In this paper, we seek to review and categorize… 

Figures and Tables from this paper

Benchmark Evaluation of Counterfactual Algorithms for XAI: From a White Box to a Black Box
TLDR
All explainable counterfactual algorithms that do not take into consideration plausibility in their internal mechanisms cannot be evaluated with the current state of the art evaluation metrics.
Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics
TLDR
A comprehensive overview of methods proposed in the current literature for the evaluation of ML explanations is presented, finding that the quantitative metrics for both model-based and example-based explanations are primarily used to evaluate the parsimony/simplicity of interpretability, and subjective measures have been embraced as the focal point for the human-centered evaluation of explainable systems.
Counterfactual Evaluation for Explainable AI
TLDR
This work proposes a new methodology to evaluate the faithfulness of explanations from the counterfactual reasoning perspective: the model should produce substantially different outputs for the original input and its correspondingcounterfactual edited on a faithful feature.
Counterfactual Instances Explain Little
TLDR
This paper will draw on literature from the philosophy of science to argue that a satisfactory explanation must consist of both counterfactual instances and a causal equation (or system of equations) that support the counterfactUAL instances.
Attribution-based Explanations that Provide Recourse Cannot be Robust
TLDR
It is proved formally that it is in general impossible for any single attribution method to be both recourse sensitive and robust at the same time, and it follows that there must always exist counterexamples to at least one of these properties.
CARLA: A Python Library to Benchmark Algorithmic Recourse and Counterfactual Explanation Algorithms
TLDR
CARLA (Counterfactual And Recourse LibrAry), a python library for benchmarking counterfactual explanation methods across both different data sets and different machine learning models, and a standardized set of integrated evaluation measures and data sets for transparent and extensive comparisons.
ReLACE: Reinforcement Learning Agent for Counterfactual Explanations of Arbitrary Predictive Models
TLDR
This work forms the problem of crafting CFs as a sequential decisionmaking task and then finds the optimal CFs via deep reinforcement learning (DRL) with discrete-continuous hybrid action space and develops an algorithm to extract explainable decision rules from the DRL agent’s policy, so as to make the process of generating CFs itself transparent.
Explaining Data-Driven Decisions made by AI Systems: The Counterfactual Approach
TLDR
It is shown that features that have a large importance weight for a model prediction may not actually affect the corresponding decision, and importance weights are insufficient to communicate whether and how features influence system decisions.
Robust Counterfactual Explanations for Tree-Based Ensembles
TLDR
The results demonstrate that the proposed strategy RobX generates counterfactuals that are significantly more robust (nearly 100% validity after actual model changes) and also realistic over existing state-of-the-art methods.
...
...

References

SHOWING 1-10 OF 128 REFERENCES
Explaining machine learning classifiers through diverse counterfactual explanations
TLDR
This work proposes a framework for generating and evaluating a diverse set of counterfactual explanations based on determinantal point processes, and provides metrics that enable comparison ofcounterfactual-based methods to other local explanation methods.
Algorithmic Recourse: from Counterfactual Explanations to Interventions
TLDR
This work relies on causal reasoning to caution against the use of counterfactual explanations as a recommendable set of actions for recourse, and proposes a shift of paradigm from recourse via nearest counterfactUAL explanations to recourse through minimal interventions, shifting the focus from explanations to interventions.
Preserving Causal Constraints in Counterfactual Explanations for Machine Learning Classifiers
TLDR
The problem of feasibility is formulated as preserving causal relationships among input features and a method is presented that uses (partial) structural causal models to generate actionable counterfactuals that better satisfy feasibility constraints than existing methods.
Machine Learning Interpretability: A Survey on Methods and Metrics
TLDR
A review of the current state of the research field on machine learning interpretability while focusing on the societal impact and on the developed methods and metrics is provided.
A Survey on Explainable Artificial Intelligence (XAI): Toward Medical XAI
TLDR
A review on interpretabilities suggested by different research works and categorize them is provided, hoping that insight into interpretability will be born with more considerations for medical practices and initiatives to push forward data-based, mathematically grounded, and technically grounded medical education are encouraged.
CERTIFAI: Counterfactual Explanations for Robustness, Transparency, Interpretability, and Fairness of Artificial Intelligence models
TLDR
CERTIFAI is a general tool that can be applied to any black-box model and any type of input data, and introduces CERScore, the first black- box model robustness score that performs comparably to methods that have access to model internals.
Explaining Data-Driven Decisions made by AI Systems: The Counterfactual Approach
TLDR
It is shown that features that have a large importance weight for a model prediction may not actually affect the corresponding decision, and importance weights are insufficient to communicate whether and how features influence system decisions.
Local Rule-Based Explanations of Black Box Decision Systems
TLDR
This paper proposes LORE, an agnostic method able to provide interpretable and faithful explanations for black box outcome explanation, and shows that LORE outperforms existing methods and baselines both in the quality of explanations and in the accuracy in mimicking the black box.
Model-Agnostic Counterfactual Explanations for Consequential Decisions
TLDR
This work builds on standard theory and tools from formal verification and proposes a novel algorithm that solves a sequence of satisfiability problems, where both the distance function (objective) and predictive model (constraints) are represented as logic formulae.
...
...