• Corpus ID: 227228620

Explaining by Removing: A Unified Framework for Model Explanation

  title={Explaining by Removing: A Unified Framework for Model Explanation},
  author={Ian Covert and Scott M. Lundberg and Su-In Lee},
  journal={J. Mach. Learn. Res.},
Researchers have proposed a wide variety of model explanation approaches, but it remains unclear how most methods are related or when one method is preferable to another. We establish a new class of methods, removal-based explanations, that are based on the principle of simulating feature removal to quantify each feature's influence. These methods vary in several respects, so we develop a framework that characterizes each method along three dimensions: 1) how the method removes features, 2… 
Which Explanation Should I Choose? A Function Approximation Perspective to Characterizing Post hoc Explanations
This work adopts a function approximation perspective and formalizes the local function approximation (LFA) framework, showing that popular explanation methods are instances of this framework, performing function approximations of the underlying model in different neighborhoods using different loss functions.
Do not explain without context: addressing the blind spot of model explanations
It is postulate that obtaining robust and useful explanations always requires supporting them with a broader context, and that many model explanations depend directly or indirectly on the choice of the referenced data distribution.
Accurate and robust Shapley Values for explaining predictions and focusing on local important variables
The concept of "Same Decision Probability" (SDP) that evaluates the robustness of a prediction when some variables are missing is used and produces sparse additive explanations easier to visualize and analyse.
PredDiff: Explanations and Interactions from Conditional Expectations
This work clarifies properties of PredDiff and put forward several extensions of the original formalism, and introduces a new measure for interaction effects.
Accurate Shapley Values for explaining tree-based models
This work reminds an invariance principle for SV and derives the correct approach for computing the SV of categorical variables that are particularly sensitive to the encoding used and introduces two estimators of Shapley Values that exploit the tree structure efficiently and are more accurate than state-of-the-art methods.
Fairness via Explanation Quality: Evaluating Disparities in the Quality of Post hoc Explanations
This work highlights the problem of group-based disparities in explanation quality and proposes a novel evaluation framework which can quantitatively measure disparities in the quality of explanations output by state-of-the-art explanation methods.
Shapley Values for Feature Selection: The Good, the Bad, and the Axioms
This paper calls into question the use of the Shapley value as a feature selection tool, using simple, abstract “toy” counterexamples to illustrate that the axioms may work against the goals of feature selection.
From Clustering to Cluster Explanations via Neural Networks
A new framework is proposed that can, for the first time, explain cluster assignments in terms of input features in a comprehensive manner, based on the novel theoretical insight that clustering models can be rewritten as neural networks, or 'neuralized'.
Ensembles of Random SHAPs
Ensemble-based modifications of the well-known SHapley Additive exPlanations (SHAP) method for the local explanation of a black-box model are proposed. The modifications aim to simplify SHAP which is
Post-Hoc Explanations Fail to Achieve their Purpose in Adversarial Contexts
It is shown that post-hoc explanation algorithms are unsuitable to achieve the transparency objectives inherent to the legal norms, and there is a need to more explicitly discuss the objectives underlying “ex-plainability” obligations as these can often be better achieved through other mechanisms.


Asymmetric Shapley values: incorporating causal knowledge into model-agnostic explainability
Asymmetric Shapley values can improve model explanations by incorporating causal information, provide an unambiguous test for unfair discrimination in model predictions, enable sequentially incremental explanations in time-series models, and support feature-selection studies without the need for model retraining.
Explanatory coherence in social explanations : a parallel distributed processing account
This article studies the impact of explanatory coherence on the evaluation of explanations. Tested were 4 principles of P. Thagard's (1989) model for evaluating the coherence of explanations. Study I
Causal Shapley Values: Exploiting Causal Knowledge to Explain Individual Predictions of Complex Models
A novel framework for computing Shapley values that generalizes recent work that aims to circumvent the independence assumption is proposed and it is shown how these 'causal' Shapleyvalues can be derived for general causal graphs without sacrificing any of their desirable properties.
The many Shapley values for model explanation
The axiomatic approach is used to study the differences between some of the many operationalizations of the Shapley value for attribution, and a technique called Baseline Shapley (BShap) is proposed that is backed by a proper uniqueness result.
Simplicity and probability in causal explanation
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
LIME is proposed, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning aninterpretable model locally varound the prediction.
The Explanation Game: Explaining Machine Learning Models with Cooperative Game Theory
This work illustrates how subtle differences in the underlying game formulations of existing methods can cause large differences in attribution for a prediction, and proposes a general framework for generating explanations for ML models, called formulate, approximate, and explain (FAE).
True to the Model or True to the Data?
It is argued that the choice comes down to whether it is desirable to be true to the model ortrue to the data, and how possible attributions are impacted by modeling choices.
Explanation in Artificial Intelligence: Insights from the Social Sciences