"Why Should I Trust You?": Explaining the Predictions of Any Classifier

@article{Ribeiro2016WhySI,
  title={"Why Should I Trust You?": Explaining the Predictions of Any Classifier},
  author={Marco Tulio Ribeiro and Sameer Singh and Carlos Guestrin},
  journal={Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
  year={2016}
}
Despite widespread adoption, machine learning models remain mostly black boxes. [...] Key Method We also propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). We show the utility of explanations via novel experiments, both…Expand
Can I Trust the Explainer? Verifying Post-hoc Explanatory Methods
TLDR
This work introduces a verification framework for explanatory methods under the feature-selection perspective, based on a non-trivial neural network architecture trained on a real-world task, and for which it is able to provide guarantees on its inner workings. Expand
Minimalistic Explanations: Capturing the Essence of Decisions
TLDR
It is argued that explanations can be very minimalistic while retaining the essence of a decision, but the decision-making contexts that can be conveyed in this manner is limited and the first insights into quality criteria of post-hoc explanations are shared. Expand
How Much Can I Trust You? - Quantifying Uncertainties in Explaining Neural Networks
TLDR
This work proposes a new framework that allows to convert any arbitrary explanation method for neural networks into an explanation methodFor Bayesian neural networks, with an in-built modeling of uncertainties, translating the intrinsic network model uncertainties into a quantification of explanation uncertainties. Expand
Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations
TLDR
This work introduces a method for efficiently explaining and regularizing differentiable models by examining and selectively penalizing their input gradients, which provide a normal to the decision boundary. Expand
How model accuracy and explanation fidelity influence user trust
TLDR
It is found that accuracy is more important for user trust than explainability, and that users cannot be tricked by high-fidelity explanations into having trust for a bad classifier. Expand
A Unified Approach to Interpreting Model Predictions
TLDR
A unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations), which unifies six existing methods and presents new methods that show improved computational performance and/or better consistency with human intuition than previous approaches. Expand
Explaining the Predictions of Any Image Classifier via Decision Trees
TLDR
A decision Tree-based LIME approach, which uses a decision tree model to form an interpretable representation that is locally faithful to the original model, can capture nonlinear interactions among features in the data and creates plausible explanations. Expand
Why X rather than Y? Explaining Neural Model' Predictions by Generating Intervention Counterfactual Samples
TLDR
This paper proposes a novel idea to intervene and generate minimally modified contrastive sample to be classified as Y, that then results in a simple natural text giving answer to the question "Why X rather than Y?". Expand
Why Should I Trust This Item? Explaining the Recommendations of any Model
TLDR
A deep analysis of 7 state-of-the-art models learnt on 6 datasets based on the identification of the items or the sequences of items actively used by the models, which provides interpretable explanations of the recommendations – useful to compare different models and explain the reasons behind the recommendation to the user. Expand
A User Study on the Effect of Aggregating Explanations for Interpreting Machine Learning Models
Recently, there is growing consensus of the critical need to have better techniques to explain machine learning models. However, many of the popular techniques are instance-level explanations, whichExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 46 REFERENCES
Towards Extracting Faithful and Descriptive Representations of Latent Variable Models
TLDR
Preliminary experiments on knowledge extraction from text indicate that even though Bayesian networks may be more faithful to a matrix factorization model than the logic rules, the latter are possibly more useful for interpretation and debugging. Expand
How to Explain Individual Classification Decisions
TLDR
This paper proposes a procedure which (based on a set of assumptions) allows to explain the decisions of any classification method. Expand
Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model
TLDR
A generative model called Bayesian Rule Lists is introduced that yields a posterior distribution over possible decision lists that employs a novel prior structure to encourage sparsity and has predictive accuracy on par with the current top algorithms for prediction in machine learning. Expand
Explaining Data-Driven Document Classifications
TLDR
This paper extends the most relevant prior theoretical model of explanations for intelligent systems to account for some missing elements, and defines a new sort of explanation as a minimal set of words, such that removing all words within this set from the document changes the predicted class from the class of interest. Expand
You Are the Only Possible Oracle: Effective Test Selection for End Users of Interactive Machine Learning Systems
TLDR
This work considers the problem of testing this common kind of machine-generated program when the only oracle is an end user, and presents test selection methods that provide very good failure rates even for small test suites and shows that some methods are able to find the arguably most difficult-to-detect faults of classifiers. Expand
Explaining collaborative filtering recommendations
TLDR
This paper presents experimental evidence that shows that providing explanations can improve the acceptance of ACF systems, and presents a model for explanations based on the user's conceptual model of the recommendation process. Expand
Towards Transparent Systems: Semantic Characterization of Failure Modes
TLDR
This work proposes characterizing the failure modes of a vision system using semantic attributes, and generates a “specification sheet” that can predict oncoming failures for face and animal species recognition better than several strong baselines. Expand
Predicting Failures of Vision Systems
TLDR
This work shows that a surprisingly straightforward and general approach, that is ALERT, can predict the likely accuracy (or failure) of a variety of computer vision systems - semantic segmentation, vanishing point and camera parameter estimation, and image memorability prediction - on individual input images. Expand
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
TLDR
A Sentiment Treebank that includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality, and introduces the Recursive Neural Tensor Network. Expand
Comprehensible classification models: a position paper
TLDR
This paper discusses the interpretability of five types of classification models, namely decision trees, classification rules, decision tables, nearest neighbors and Bayesian network classifiers, and the drawbacks of using model size as the only criterion to evaluate the comprehensibility of a model. Expand
...
1
2
3
4
5
...