Explanatory causal effects for model agnostic explanations

  title={Explanatory causal effects for model agnostic explanations},
  author={Jiuyong Li and Ha Xuan Tran and Thuc Duy Le and Lin Liu and Kui Yu and Jixue Liu},
This paper studies the problem of estimating the contributions of features to the prediction of a specific instance by a machine learning model and the overall contribution of a feature to the model. The causal effect of a feature (variable) on the predicted outcome reflects the contribution of the feature to a prediction very well. A challenge is that most existing causal effects cannot be estimated from data without a known causal graph. In this paper, we define an explanatory causal effect… 

Figures and Tables from this paper



Explaining Visual Models by Causal Attribution

It is argued that explanations should be based on the causal model of the data and the derived intervened causal models, that represent the data distribution subject to interventions, that can compute counterfactuals, new samples that will inform how the model reacts to feature changes on the authors' input.

Generative causal explanations of black-box classifiers

This work develops a method for generating causal post-hoc explanations of black-box classifiers based on a learned low-dimensional representation of the data that encourages both the generative model and the latent factors to have a large causal influence on the classifier output.

CXPlain: Causal Explanations for Model Interpretation under Uncertainty

The task of providing explanations for the decisions of machine-learning models as a causal learning task is framed, and causal explanation (CXPlain) models that learn to estimate to what degree certain inputs cause outputs in another machine- learning model are trained.

Causal Interpretability for Machine Learning - Problems, Methods and Evaluation

This work presents a comprehensive survey on causal interpretable models from the aspects of the problems and methods and provides in-depth insights into the existing evaluation metrics for measuring interpretability, which can help practitioners understand for what scenarios each evaluation metric is suitable.

A Unified View of Causal and Non-causal Feature Selection

This article first shows that causal and non-causal feature selection methods share the same objective, to find the Markov blanket of a class attribute, the theoretically optimal feature set for classification.

Explaining prediction models and individual predictions with feature contributions

A sensitivity analysis-based method for explaining prediction models that can be applied to any type of classification or regression model, and which is equivalent to commonly used additive model-specific methods when explaining an additive model.

Estimating high-dimensional intervention effects from observational data

This paper proposes to use summary measures of the set of possible causal effects to determine variable importance and uses the minimum absolute value of this set, since that is a lower bound on the size of the causal effect.

Measurable Counterfactual Local Explanations for Any Classifier

A novel method for explaining the predictions of any classifier by using regression to generate local explanations and a definition of fidelity to the underlying classifier for local explanation models which is based on distances to a target decision boundary is introduced.

Neural Network Attributions: A Causal Perspective

A new attribution method for neural networks developed using first principles of causality is proposed, and algorithms to efficiently compute the causal effects, as well as scale the approach to data with large dimensionality are proposed.

Explaining Classifiers with Causal Concept Effect (CaCE)

This work defines the Causal Concept Effect (CaCE) as the causal effect of a human-interpretable concept on a deep neural net's predictions, and shows that the CaCE measure can avoid errors stemming from confounding.