“Why Should I Trust You?”: Explaining the Predictions of Any Classifier

  title={“Why Should I Trust You?”: Explaining the Predictions of Any Classifier},
  author={Marco Tulio Ribeiro and Sameer Singh and Carlos Guestrin},
  journal={Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
Despite widespread adoption, machine learning models remain mostly black boxes. [] Key Method We also propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). We show the utility of explanations via novel experiments, both…

Figures from this paper

Can I Trust the Explainer? Verifying Post-hoc Explanatory Methods

This work introduces a verification framework for explanatory methods under the feature-selection perspective, based on a non-trivial neural network architecture trained on a real-world task, and for which it is able to provide guarantees on its inner workings.

How Much Can I Trust You? - Quantifying Uncertainties in Explaining Neural Networks

This work proposes a new framework that allows to convert any arbitrary explanation method for neural networks into an explanation methodFor Bayesian neural networks, with an in-built modeling of uncertainties, translating the intrinsic network model uncertainties into a quantification of explanation uncertainties.

Minimalistic Explanations: Capturing the Essence of Decisions

It is argued that explanations can be very minimalistic while retaining the essence of a decision, but the decision-making contexts that can be conveyed in this manner is limited and the first insights into quality criteria of post-hoc explanations are shared.

Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations

This work introduces a method for efficiently explaining and regularizing differentiable models by examining and selectively penalizing their input gradients, which provide a normal to the decision boundary.

Analyzing the Effects of Classifier Lipschitzness on Explainers

This paper proposes and formally proposes explainer astuteness – a property of explainers which captures the probability that a given method provides similar explanations to similar data points, and provides a theoretical way to connect this explainerAstuteness to the probabilistic Lipschitzness of the black-box function that is being explained.

How model accuracy and explanation fidelity influence user trust

It is found that accuracy is more important for user trust than explainability, and that users cannot be tricked by high-fidelity explanations into having trust for a bad classifier.

A Unified Approach to Interpreting Model Predictions

A unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations), which unifies six existing methods and presents new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.

Why Should I Trust This Item? Explaining the Recommendations of any Model

A deep analysis of 7 state-of-the-art models learnt on 6 datasets based on the identification of the items or the sequences of items actively used by the models, which provides interpretable explanations of the recommendations – useful to compare different models and explain the reasons behind the recommendation to the user.

Explaining the Predictions of Any Image Classifier via Decision Trees

A decision Tree-based LIME approach, which uses a decision tree model to form an interpretable representation that is locally faithful to the original model, can capture nonlinear interactions among features in the data and creates plausible explanations.

A User Study on the Effect of Aggregating Explanations for Interpreting Machine Learning Models

This work-in-progress paper explores the effectiveness of providing instance-level explanations in aggregate, by demonstrating that such aggregated explanations have a significant impact on users’ ability to detect biases in data.



Towards Extracting Faithful and Descriptive Representations of Latent Variable Models

Preliminary experiments on knowledge extraction from text indicate that even though Bayesian networks may be more faithful to a matrix factorization model than the logic rules, the latter are possibly more useful for interpretation and debugging.

How to Explain Individual Classification Decisions

This paper proposes a procedure which (based on a set of assumptions) allows to explain the decisions of any classification method.

Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model

A generative model called Bayesian Rule Lists is introduced that yields a posterior distribution over possible decision lists that employs a novel prior structure to encourage sparsity and has predictive accuracy on par with the current top algorithms for prediction in machine learning.

Explaining collaborative filtering recommendations

This paper presents experimental evidence that shows that providing explanations can improve the acceptance of ACF systems, and presents a model for explanations based on the user's conceptual model of the recommendation process.

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

A Sentiment Treebank that includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality, and introduces the Recursive Neural Tensor Network.

Towards Transparent Systems: Semantic Characterization of Failure Modes

This work proposes characterizing the failure modes of a vision system using semantic attributes, and generates a “specification sheet” that can predict oncoming failures for face and animal species recognition better than several strong baselines.

Predicting Failures of Vision Systems

This work shows that a surprisingly straightforward and general approach, that is ALERT, can predict the likely accuracy (or failure) of a variety of computer vision systems - semantic segmentation, vanishing point and camera parameter estimation, and image memorability prediction - on individual input images.

Dataset Shift in Machine Learning

This volume offers an overview of current efforts to deal with dataset and covariate shift, and places dataset shift in relationship to transfer learning, transduction, local learning, active learning, and semi-supervised learning.

Comprehensible classification models: a position paper

This paper discusses the interpretability of five types of classification models, namely decision trees, classification rules, decision tables, nearest neighbors and Bayesian network classifiers, and the drawbacks of using model size as the only criterion to evaluate the comprehensibility of a model.

Towards Universal Paraphrastic Sentence Embeddings

This work considers the problem of learning general-purpose, paraphrastic sentence embeddings based on supervision from the Paraphrase Database, and compares six compositional architectures, finding that the most complex architectures, such as long short-term memory (LSTM) recurrent neural networks, perform best on the in-domain data.