"Why Should I Trust You?": Explaining the Predictions of Any Classifier

@article{Ribeiro2016WhySI,
  title={"Why Should I Trust You?": Explaining the Predictions of Any Classifier},
  author={Marco Tulio Ribeiro and Sameer Singh and Carlos Guestrin},
  journal={Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
  year={2016}
}
Despite widespread adoption, machine learning models remain mostly black boxes. [] Key Method We also propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). We show the utility of explanations via novel experiments, both…

Figures from this paper

How Much Can I Trust You? - Quantifying Uncertainties in Explaining Neural Networks

TLDR
This work proposes a new framework that allows to convert any arbitrary explanation method for neural networks into an explanation methodFor Bayesian neural networks, with an in-built modeling of uncertainties, translating the intrinsic network model uncertainties into a quantification of explanation uncertainties.

Minimalistic Explanations: Capturing the Essence of Decisions

TLDR
It is argued that explanations can be very minimalistic while retaining the essence of a decision, but the decision-making contexts that can be conveyed in this manner is limited and the first insights into quality criteria of post-hoc explanations are shared.

Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations

TLDR
This work introduces a method for efficiently explaining and regularizing differentiable models by examining and selectively penalizing their input gradients, which provide a normal to the decision boundary.

How model accuracy and explanation fidelity influence user trust

TLDR
It is found that accuracy is more important for user trust than explainability, and that users cannot be tricked by high-fidelity explanations into having trust for a bad classifier.

Analyzing the Effects of Classifier Lipschitzness on Explainers

TLDR
This paper proposes and formally proposes explainer astuteness – a property of explainers which captures the probability that a given method provides similar explanations to similar data points, and provides a theoretical way to connect this explainerAstuteness to the probabilistic Lipschitzness of the black-box function that is being explained.

A Unified Approach to Interpreting Model Predictions

TLDR
A unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations), which unifies six existing methods and presents new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.

Explaining the Predictions of Any Image Classifier via Decision Trees

TLDR
A decision Tree-based LIME approach, which uses a decision tree model to form an interpretable representation that is locally faithful to the original model, can capture nonlinear interactions among features in the data and creates plausible explanations.

Why X rather than Y? Explaining Neural Model' Predictions by Generating Intervention Counterfactual Samples

TLDR
This paper proposes a novel idea to intervene and generate minimally modified contrastive sample to be classified as Y, that then results in a simple natural text giving answer to the question "Why X rather than Y?".

Why Should I Trust This Item? Explaining the Recommendations of any Model

TLDR
A deep analysis of 7 state-of-the-art models learnt on 6 datasets based on the identification of the items or the sequences of items actively used by the models, which provides interpretable explanations of the recommendations – useful to compare different models and explain the reasons behind the recommendation to the user.

Can Post-hoc Explanations Effectively Detect Out-of-Distribution Samples?

TLDR
This work investigates whether local explanations can be utilized for detecting Out-of-Distribution test samples in machine learning classifiers, and devise and assess the performance of a clustering-based OoD detection approach that exemplifies how heatmaps produced by well-established local explanation methods can be of further use than explaining individual predictions issued by the model under analysis.
...

References

SHOWING 1-10 OF 40 REFERENCES

Towards Extracting Faithful and Descriptive Representations of Latent Variable Models

TLDR
Preliminary experiments on knowledge extraction from text indicate that even though Bayesian networks may be more faithful to a matrix factorization model than the logic rules, the latter are possibly more useful for interpretation and debugging.

How to Explain Individual Classification Decisions

TLDR
This paper proposes a procedure which (based on a set of assumptions) allows to explain the decisions of any classification method.

Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model

TLDR
A generative model called Bayesian Rule Lists is introduced that yields a posterior distribution over possible decision lists that employs a novel prior structure to encourage sparsity and has predictive accuracy on par with the current top algorithms for prediction in machine learning.

Explaining Data-Driven Document Classifications

TLDR
This paper extends the most relevant prior theoretical model of explanations for intelligent systems to account for some missing elements, and defines a new sort of explanation as a minimal set of words, such that removing all words within this set from the document changes the predicted class from the class of interest.

You Are the Only Possible Oracle: Effective Test Selection for End Users of Interactive Machine Learning Systems

TLDR
This work considers the problem of testing this common kind of machine-generated program when the only oracle is an end user, and presents test selection methods that provide very good failure rates even for small test suites and shows that some methods are able to find the arguably most difficult-to-detect faults of classifiers.

Explaining collaborative filtering recommendations

TLDR
This paper presents experimental evidence that shows that providing explanations can improve the acceptance of ACF systems, and presents a model for explanations based on the user's conceptual model of the recommendation process.

Towards Transparent Systems: Semantic Characterization of Failure Modes

TLDR
This work proposes characterizing the failure modes of a vision system using semantic attributes, and generates a “specification sheet” that can predict oncoming failures for face and animal species recognition better than several strong baselines.

Predicting Failures of Vision Systems

TLDR
This work shows that a surprisingly straightforward and general approach, that is ALERT, can predict the likely accuracy (or failure) of a variety of computer vision systems - semantic segmentation, vanishing point and camera parameter estimation, and image memorability prediction - on individual input images.

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

TLDR
A Sentiment Treebank that includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality, and introduces the Recursive Neural Tensor Network.

Comprehensible classification models: a position paper

TLDR
This paper discusses the interpretability of five types of classification models, namely decision trees, classification rules, decision tables, nearest neighbors and Bayesian network classifiers, and the drawbacks of using model size as the only criterion to evaluate the comprehensibility of a model.