• Corpus ID: 21889700

A Unified Approach to Interpreting Model Predictions

@article{Lundberg2017AUA,
  title={A Unified Approach to Interpreting Model Predictions},
  author={Scott M. Lundberg and Su-In Lee},
  journal={ArXiv},
  year={2017},
  volume={abs/1705.07874}
}
Understanding why a model makes a certain prediction can be as crucial as the prediction's accuracy in many applications. [] Key Method SHAP assigns each feature an importance value for a particular prediction. Its novel components include: (1) the identification of a new class of additive feature importance measures, and (2) theoretical results showing there is a unique solution in this class with a set of desirable properties. The new class unifies six existing methods, notable because several recent…

Figures from this paper

Training Deep Models to be Explained with Fewer Examples
TLDR
This work proposes a method for training deep models such that their predictions are faithfully explained by explanation models with a small number of examples, and can be incorporated into any neural network-based prediction models.
DALEX: explainers for complex predictive models
TLDR
A consistent collection of explainers for predictive models, a.k.a. black boxes, which are based on a uniform standardized grammar of model exploration which may be easily extended and supports the most popular frameworks for classification and regression.
Evaluating Explainers via Perturbation
TLDR
This work introduces the c-Eval metric and the corresponding framework to quantify the explainer's quality on feature-based explainers of machine learning image classifiers and conducts extensive experiments of explainers on three different datasets in order to support the adoption of c- Eval in evaluating explainers' performance.
Definitions, methods, and applications in interpretable machine learning
TLDR
This work addresses concerns about interpretability in the context of machine learning and introduces the predictive, descriptive, relevant (PDR) framework for discussing interpretations, and provides numerous real-world examples to demonstrate how practitioners can use the PDR framework to evaluate and understand interpretations.
GPEX, A Framework For Interpreting Artificial Neural Networks
TLDR
This paper finds a Gaussian process (GP) whose predictions almost match those of the ANN and uses the trained GP to explain the ANN’s decisions, and proposes a framework that shortens the gap between the two aforementioned groups of methods.
Interpretable Machine Learning
TLDR
This work addresses concerns about interpretability in the context of machine learning and introduces the predictive, descriptive, relevant (PDR) framework for discussing interpretations, and provides numerous real-world examples to demonstrate how practitioners can use the PDR framework to evaluate and understand interpretations.
Neural Basis Models for Interpretability
TLDR
On a variety of tabular and image datasets, it is demonstrated that for interpretable machine learning, NBMs are the state-of-the-art in accuracy, model size, and, throughput and can easily model all higher-order feature interactions.
Explaining Single Predictions: A Faster Method
TLDR
The domain of single prediction explanation, performed by providing the user a detailed explanation of the attribute’s influence on each single predicted instance, related to a particular machine learning model, is investigated.
A Framework to Learn with Interpretation
TLDR
A high level of conciseness is imposed by constraining the activation of a very few attributes for a given input with a real-entropy-based criterion while enforcing fidelity to both inputs and outputs of the predictive model.
MonoNet: Towards Interpretable Models by Learning Monotonic Features
TLDR
It is argued that by enforcing monotonicity between features and outputs, the difficulty of interpreting a complex model stems from the existing interactions among features, and it is shown how to structurally introduce this constraint in deep learning models by adding new simple layers.
...
...

References

SHOWING 1-10 OF 10 REFERENCES
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
TLDR
LIME is proposed, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning aninterpretable model locally varound the prediction.
Explaining prediction models and individual predictions with feature contributions
TLDR
A sensitivity analysis-based method for explaining prediction models that can be applied to any type of classification or regression model, and which is equivalent to commonly used additive model-specific methods when explaining an additive model.
Learning Important Features Through Propagating Activation Differences
TLDR
DeepLIFT (Deep Learning Important FeaTures), a method for decomposing the output prediction of a neural network on a specific input by backpropagating the contributions of all neurons in the network to every feature of the input, is presented.
On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation
TLDR
This work proposes a general solution to the problem of understanding classification decisions by pixel-wise decomposition of nonlinear classifiers by introducing a methodology that allows to visualize the contributions of single pixels to predictions for kernel-based classifiers over Bag of Words features and for multilayered neural networks.
Not Just a Black Box: Learning Important Features Through Propagating Activation Differences
TLDR
DeepLIFT (Learning Important FeaTures), an efficient and effective method for computing importance scores in a neural network that compares the activation of each neuron to its 'reference activation' and assigns contribution scores according to the difference.
Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems
TLDR
The transparency-privacy tradeoff is explored and it is proved that a number of useful transparency reports can be made differentially private with very little addition of noise.
Analysis of regression in game theory approach
Working with multiple regression analysis a researcher usually wants to know a comparative importance of predictors in the model. However, the analysis can be made difficult because of
Extremal Principle Solutions of Games in Characteristic Function Form: Core, Chebychev and Shapley Value Generalizations
In 1966, W. Lucas [1] exhibited a 10 person game with no von Neumann-Morgenstern solution. D. Schmeidler [2] then originated the nucleolus, proved it exists for every game, is unique and is contained
Monotonic solutions of cooperative games
The principle of monotonicity for cooperative games states that if a game changes so that some player's contribution to all coalitions increases or stays the same then the player's allocation should
17. A Value for n-Person Games