Causal Interpretability for Machine Learning - Problems, Methods and Evaluation

  title={Causal Interpretability for Machine Learning - Problems, Methods and Evaluation},
  author={Raha Moraffah and Mansooreh Karami and Ruocheng Guo and Adrienne Jeanisha Raglin and Huan Liu},
  journal={ACM SIGKDD Explorations Newsletter},
  pages={18 - 33}
Machine learning models have had discernible achievements in a myriad of applications. However, most of these models are black-boxes, and it is obscure how the decisions are made by them. This makes the models unreliable and untrustworthy. To provide insights into the decision making processes of these models, a variety of traditional interpretable models have been proposed. Moreover, to generate more humanfriendly explanations, recent work on interpretability tries to answer questions related… 

Figures and Tables from this paper

Interpretable Deep Learning: Interpretations, Interpretability, Trustworthiness, and Beyond

A comprehensive survey of the current works in evaluating models’ interpretability using “trustworthy” interpretation algorithms, and elaborate the designs of a number of interpretation algorithms by proposing a new taxonomy.

Interpretation of Black Box NLP Models: A Survey

This survey will present a comprehensive survey of methods employed for interpretability followed by a discussion on the different approaches adopted in NLP space and highlight the synergy between different approaches and issues with interpretability in N LP space.

Generative causal explanations of black-box classifiers

This work develops a method for generating causal post-hoc explanations of black-box classifiers based on a learned low-dimensional representation of the data that encourages both the generative model and the latent factors to have a large causal influence on the classifier output.

A survey on the interpretability of deep learning in medical diagnosis

This paper comprehensively review the interpretability of deep learning in medical diagnosis based on the current literature, including some common interpretability methods used in the medical domain, various applications with interpretability for disease diagnosis, prevalent evaluation metrics, and several disease datasets.

Explanation Consistency Training: Facilitating Consistency-Based Semi-Supervised Learning with Interpretability

ECT (Explanation Consistency Training) is proposed which encourages a consistent reason of model decision under data perturbation and employs model explanation as a surrogate of the causality of model output, which is able to bridge state-of-the-art interpretability to SSL models and alleviate the high complexity of causality.

An introduction to causal reasoning in health analytics

This chapter tries to highlight some of the drawbacks that may arise in traditional machine learning and statistical approaches to analyze the observational data, particularly in the healthcare data analytics domain and demonstrates the applications of causal inference in tackling some common machine learning issues such as missing data and model transportability.

Interpretability for Conditional Average Treatment Effect Estimation

It is shown how the proposed framework for interpreting the Conditional Average Treatment Effect (CATE) estimation problem can serve as a tool for model selection, which is naturally challenging in causal inference tasks.

Counterfactual Evaluation for Explainable AI

This work proposes a new methodology to evaluate the faithfulness of explanations from the counterfactual reasoning perspective: the model should produce substantially different outputs for the original input and its correspondingcounterfactual edited on a faithful feature.

Evaluation Methods and Measures for Causal Learning Algorithms

This survey provides a comprehensive review of the evaluation of fundamental tasks in causal inference and causality-aware machine learning tasks and seeks to expedite the marriage of causality and machine learning via discussions of prominent open problems and challenges.



Model-Agnostic Interpretability of Machine Learning

This paper argues for explaining machine learning predictions using model-agnostic approaches, treating the machine learning models as black-box functions, which provide crucial flexibility in the choice of models, explanations, and representations, improving debugging, comparison, and interfaces for a variety of users and models.

Learning Interpretable Models with Causal Guarantees

This work proposes a framework for learning causal interpretable models---from observational data---that can be used to predict individual treatment effects and proves an error bound on the treatment effects predicted by the model.

Model Agnostic Supervised Local Explanations

It is demonstrated, on several UCI datasets, that MAPLE is at least as accurate as random forests and that it produces more faithful local explanations than LIME, a popular interpretability system.

Explaining Explanations: An Overview of Interpretability of Machine Learning

There has recently been a surge of work in explanatory artificial intelligence (XAI). This research area tackles the important problem that complex machines and algorithms often cannot provide

Evaluating Explanation Without Ground Truth in Interpretable Machine Learning

To benchmark the evaluation in IML, this article rigorously defines the problem of evaluating explanations, and systematically review the existing efforts from state-of-the-arts, and summarizes three general aspects of explanation with formal definitions.

"Why Should I Trust You?": Explaining the Predictions of Any Classifier

LIME is proposed, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning aninterpretable model locally varound the prediction.

Towards A Rigorous Science of Interpretable Machine Learning

This position paper defines interpretability and describes when interpretability is needed (and when it is not), and suggests a taxonomy for rigorous evaluation and exposes open questions towards a more rigorous science of interpretable machine learning.

Causal Interpretations of Black-Box Models

  • Qingyuan ZhaoT. Hastie
  • Computer Science
    Journal of business & economic statistics : a publication of the American Statistical Association
  • 2019
The possibility of extracting causal interpretations from black-box machine-trained models, and three requirements to make causal interpretations: a model with good predictive performance, some domain knowledge in the form of a causal diagram and suitable visualization tools.

A Unified Approach to Interpreting Model Predictions

A unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations), which unifies six existing methods and presents new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.

Explaining machine learning classifiers through diverse counterfactual explanations

This work proposes a framework for generating and evaluating a diverse set of counterfactual explanations based on determinantal point processes, and provides metrics that enable comparison ofcounterfactual-based methods to other local explanation methods.