# A Unified Approach to Interpreting Model Predictions

@article{Lundberg2017AUA, title={A Unified Approach to Interpreting Model Predictions}, author={Scott M. Lundberg and Su-In Lee}, journal={ArXiv}, year={2017}, volume={abs/1705.07874} }

Understanding why a model makes a certain prediction can be as crucial as the prediction's accuracy in many applications. [... ] Key Method SHAP assigns each feature an importance value for a particular prediction. Its novel components include: (1) the identification of a new class of additive feature importance measures, and (2) theoretical results showing there is a unique solution in this class with a set of desirable properties. The new class unifies six existing methods, notable because several recent… Expand

## 5,821 Citations

Training Deep Models to be Explained with Fewer Examples

- Computer ScienceArXiv
- 2021

This work proposes a method for training deep models such that their predictions are faithfully explained by explanation models with a small number of examples, and can be incorporated into any neural network-based prediction models.

DALEX: explainers for complex predictive models

- Computer ScienceArXiv
- 2018

A consistent collection of explainers for predictive models, a.k.a. black boxes, which are based on a uniform standardized grammar of model exploration which may be easily extended and supports the most popular frameworks for classification and regression.

Evaluating Explainers via Perturbation

- Computer ScienceArXiv
- 2019

This work introduces the c-Eval metric and the corresponding framework to quantify the explainer's quality on feature-based explainers of machine learning image classifiers and conducts extensive experiments of explainers on three different datasets in order to support the adoption of c- Eval in evaluating explainers' performance.

Deﬁnitions, methods, and applications in interpretable machine learning

- Computer Science
- 2019

This work addresses concerns about interpretability in the context of machine learning and introduces the predictive, descriptive, relevant (PDR) framework for discussing interpretations, and provides numerous real-world examples to demonstrate how practitioners can use the PDR framework to evaluate and understand interpretations.

GPEX, A Framework For Interpreting Artificial Neural Networks

- Computer ScienceArXiv
- 2021

This paper finds a Gaussian process (GP) whose predictions almost match those of the ANN and uses the trained GP to explain the ANN’s decisions, and proposes a framework that shortens the gap between the two aforementioned groups of methods.

Interpretable Machine Learning

- Computer Science
- 2021

This work addresses concerns about interpretability in the context of machine learning and introduces the predictive, descriptive, relevant (PDR) framework for discussing interpretations, and provides numerous real-world examples to demonstrate how practitioners can use the PDR framework to evaluate and understand interpretations.

Neural Basis Models for Interpretability

- Computer ScienceArXiv
- 2022

On a variety of tabular and image datasets, it is demonstrated that for interpretable machine learning, NBMs are the state-of-the-art in accuracy, model size, and, throughput and can easily model all higher-order feature interactions.

Explaining Single Predictions: A Faster Method

- Computer ScienceSOFSEM
- 2020

The domain of single prediction explanation, performed by providing the user a detailed explanation of the attribute’s influence on each single predicted instance, related to a particular machine learning model, is investigated.

A Framework to Learn with Interpretation

- Computer ScienceNeurIPS
- 2021

A high level of conciseness is imposed by constraining the activation of a very few attributes for a given input with a real-entropy-based criterion while enforcing fidelity to both inputs and outputs of the predictive model.

MonoNet: Towards Interpretable Models by Learning Monotonic Features

- Computer ScienceArXiv
- 2019

It is argued that by enforcing monotonicity between features and outputs, the difficulty of interpreting a complex model stems from the existing interactions among features, and it is shown how to structurally introduce this constraint in deep learning models by adding new simple layers.

## References

SHOWING 1-10 OF 10 REFERENCES

"Why Should I Trust You?": Explaining the Predictions of Any Classifier

- Computer ScienceHLT-NAACL Demos
- 2016

LIME is proposed, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning aninterpretable model locally varound the prediction.

Explaining prediction models and individual predictions with feature contributions

- Computer ScienceKnowledge and Information Systems
- 2013

A sensitivity analysis-based method for explaining prediction models that can be applied to any type of classification or regression model, and which is equivalent to commonly used additive model-specific methods when explaining an additive model.

Learning Important Features Through Propagating Activation Differences

- Computer ScienceICML
- 2017

DeepLIFT (Deep Learning Important FeaTures), a method for decomposing the output prediction of a neural network on a specific input by backpropagating the contributions of all neurons in the network to every feature of the input, is presented.

On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation

- Computer SciencePloS one
- 2015

This work proposes a general solution to the problem of understanding classification decisions by pixel-wise decomposition of nonlinear classifiers by introducing a methodology that allows to visualize the contributions of single pixels to predictions for kernel-based classifiers over Bag of Words features and for multilayered neural networks.

Not Just a Black Box: Learning Important Features Through Propagating Activation Differences

- Computer ScienceArXiv
- 2016

DeepLIFT (Learning Important FeaTures), an efficient and effective method for computing importance scores in a neural network that compares the activation of each neuron to its 'reference activation' and assigns contribution scores according to the difference.

Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems

- Computer Science2016 IEEE Symposium on Security and Privacy (SP)
- 2016

The transparency-privacy tradeoff is explored and it is proved that a number of useful transparency reports can be made differentially private with very little addition of noise.

Analysis of regression in game theory approach

- Economics
- 2001

Working with multiple regression analysis a researcher usually wants to know a comparative importance of predictors in the model. However, the analysis can be made difficult because of…

Extremal Principle Solutions of Games in Characteristic Function Form: Core, Chebychev and Shapley Value Generalizations

- Mathematics
- 1988

In 1966, W. Lucas [1] exhibited a 10 person game with no von Neumann-Morgenstern solution. D. Schmeidler [2] then originated the nucleolus, proved it exists for every game, is unique and is contained…

Monotonic solutions of cooperative games

- Economics
- 1985

The principle of monotonicity for cooperative games states that if a game changes so that some player's contribution to all coalitions increases or stays the same then the player's allocation should…