• Corpus ID: 232068980

PredDiff: Explanations and Interactions from Conditional Expectations

  title={PredDiff: Explanations and Interactions from Conditional Expectations},
  author={Stefan Bl{\"u}cher and Nils Strodthoff},
PredDiff is a model-agnostic, local attribution method that is firmly rooted in probability theory. Its simple intuition is to measure prediction changes when marginalizing out feature variables. In this work, we clarify properties of PredDiff and put forward several extensions of the original formalism. Most notably, we introduce a new measure for interaction effects. Interactions are an inevitable step towards a comprehensive understanding of black-box models. Importantly, our framework… 


Feature relevance quantification in explainable AI: A causality problem
It is concluded that unconditional rather than conditional expectations provide the right notion of dropping features in contradiction to the theoretical justification of the software package SHAP.
How does this interaction affect me? Interpretable attribution for feature interactions
This work proposes an interaction attribution and detection framework called Archipelago which addresses problems of uninterpretable, model-specific, or non-axiomatic attributions to interactions, and is also scalable in real-world settings.
Axiomatic Attribution for Deep Networks
We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms— Sensitivity and
Explaining prediction models and individual predictions with feature contributions
A sensitivity analysis-based method for explaining prediction models that can be applied to any type of classification or regression model, and which is equivalent to commonly used additive model-specific methods when explaining an additive model.
Do Not Trust Additive Explanations
This paper examines the behavior of the most popular instance-level explanations under the presence of interactions, introduces a new method that detects interactions for instance- level explanations, and performs a large scale benchmark to see how frequently additive explanations may be misleading.
Quantifying and Visualizing Attribute Interactions
This work applies McGill's interaction information, which has been independently rediscovered a number of times under various names in various disciplines, to visually present the most important interactions of the data.
A Unified Taylor Framework for Revisiting Attribution Methods
A Taylor attribution framework is presented to theoretically characterize the fidelity of explanations of machine learning models, to decompose model behaviors into first-order, high-order independent, and high-orders interactive terms, which makes clearer attribution of high- order effects and complex feature interactions.
A Unified Approach to Interpreting Model Predictions
A unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations), which unifies six existing methods and presents new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.
From local explanations to global understanding with explainable AI for trees
An explanation method for trees is presented that enables the computation of optimal local explanations for individual predictions, and the authors demonstrate their method on three medical datasets.
Explaining individual predictions when features are dependent: More accurate approximations to Shapley values
This work extends the Kernel SHAP method to handle dependent features, and proposes a method for aggregating individual Shapley values, such that the prediction can be explained by groups of dependent variables.