• Corpus ID: 220495856

Explaining the data or explaining a model? Shapley values that uncover non-linear dependencies

  title={Explaining the data or explaining a model? Shapley values that uncover non-linear dependencies},
  author={Daniel Vidali Fryer and Inga Str{\"u}mke and Hien Nguyen},
Shapley values have become increasingly popular in the machine learning literature thanks to their attractive axiomatisation, flexibility, and uniqueness in satisfying certain notions of `fairness'. The flexibility arises from the myriad potential forms of the Shapley value \textit{game formulation}. Amongst the consequences of this flexibility is that there are now many types of Shapley values being discussed, with such variety being a source of potential misunderstanding. To the best of our… 

Figures and Tables from this paper

AggMapNet: enhanced and explainable low-sample omics deep learning with feature-aggregated multi-channel networks

An unsupervised novel feature aggregation tool AggMap was developed to Aggregate and Map omics features into multi-channel 2D spatial-correlated image-like feature maps (Fmaps) based on their intrinsic correlations, exhibiting strong feature reconstruction capabilities on a randomized benchmark dataset, outperforming existing methods.

Beyond Cuts in Small Signal Scenarios -- Enhanced Sneutrino Detectability Using Machine Learning

We investigate enhancing the sensitivity of new physics searches at the LHC by machine learning in the case of background dominance and a high degree of overlap between the observables for signal and

Shapley values for feature selection: The good, the bad, and the axioms

This paper calls into question the use of the Shapley value as a feature selection tool, using simple, abstract “toy” counterexamples to illustrate that the axioms may work against the goals of feature selection.

Towards interpreting ML-based automated malware detection models: a survey

A new taxonomy towards malware detection interpretation method based on the taxonomy summarized by previous researches in the common field is provided, and the first to evaluate the state-of-the-art approaches by interpretation method attributes to generate the final score.

SARGDV: Efficient identification of groundwater-dependent vegetation using synthetic aperture radar

Groundwater depletion impacts the sustainability of numerous groundwater-dependent vegetation (GDV) globally, placing significant stress on their capacity to provide environmental and ecological



Measuring and testing dependence by correlation of distances

Distance correlation is a new measure of dependence between random vectors that is based on certain Euclidean distances between sample elements rather than sample moments, yet has a compact representation analogous to the classical covariance and correlation.

A Unified Approach to Interpreting Model Predictions

A unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations), which unifies six existing methods and presents new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.

Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems

The transparency-privacy tradeoff is explored and it is proved that a number of useful transparency reports can be made differentially private with very little addition of noise.

Handbook of the Shapley Value

This is an Accepted Manuscript of a book chapter published by Routledge/CRC Press in Handbook of the Shapley value on December 6, 2019, available online:

There is a non-linear relationship between mortality and blood pressure.

Systolic blood pressure and mortality

Shapley value confidence intervals for variable selection in regression models

Multiple linear regression is a commonly used inferential and predictive process, whereby a single response variable is modeled via an affine combination of multiple explanatory covariates. The

Shapley Value Confidence Intervals for Attributing Variance Explained

The coefficient of determination, the R 2 , is often used to measure the variance explained by an affine combination of multiple explanatory covariates. An attribution of this explanatory

From local explanations to global understanding with explainable AI for trees

An explanation method for trees is presented that enables the computation of optimal local explanations for individual predictions, and the authors demonstrate their method on three medical datasets.

The Explanation Game: Explaining Machine Learning Models with Cooperative Game Theory

This work illustrates how subtle differences in the underlying game formulations of existing methods can cause large differences in attribution for a prediction, and proposes a general framework for generating explanations for ML models, called formulate, approximate, and explain (FAE).