• Corpus ID: 220302533

In-Distribution Interpretability for Challenging Modalities

  title={In-Distribution Interpretability for Challenging Modalities},
  author={Cosmas Hei{\ss} and Ron Levie and Cinjon Resnick and Gitta Kutyniok and Joan Bruna},
It is widely recognized that the predictions of deep neural networks are difficult to parse relative to simpler approaches. However, the development of methods to investigate the mode of operation of such models has advanced rapidly in the past few years. Recent work introduced an intuitive framework which utilizes generative models to improve on the meaningfulness of such explanations. In this work, we display the flexibility of this method to interpret diverse and challenging modalities… 

Figures and Tables from this paper

Fast Hierarchical Games for Image Explanations
This work presents a model-agnostic explanation method for image classification based on a hierarchical extension of Shapley coefficients – h-Shap – that resolves some of the limitations of current approaches and is scalable and can be computed without the need of approximation.
A Rate-Distortion Framework for Explaining Black-box Model Decisions
The RDE framework is a mathematically well-founded method for explaining black-box model decisions based on perturbations of the target input signal and applies to any differentiable pre-trained model such as neural networks.
Sparsest Univariate Learning Models Under Lipschitz Constraint
This work proposes continuous-domain formulations for one-dimensional regression problems that admit global minimizers that are continuous and piecewise-linear (CPWL) functions and proposes efficient algorithms that find the sparsest solution of each problem: the CPWL mapping with the least number of linear regions.
Cartoon Explanations of Image Classifiers
This work presents CartoonX (Cartoon Explanation), a novel model-agnostic explanation method tailored towards image classifiers and based on the rate-distortion explanation (RDE) framework, and demonstrates that CartoonX can reveal novel valuable explanatory information, particularly for misclassifications.


Learning Important Features Through Propagating Activation Differences
DeepLIFT (Deep Learning Important FeaTures), a method for decomposing the output prediction of a neural network on a specific input by backpropagating the contributions of all neurons in the network to every feature of the input, is presented.
Interpretable Explanations of Black Boxes by Meaningful Perturbation
A general framework for learning different kinds of explanations for any black box algorithm is proposed and the framework to find the part of an image most responsible for a classifier decision is specialised.
A Unified Approach to Interpreting Model Predictions
A unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations), which unifies six existing methods and presents new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.
Generative Image Inpainting with Contextual Attention
This work proposes a new deep generative model-based approach which can not only synthesize novel image structures but also explicitly utilize surrounding image features as references during network training to make better predictions.
Unsupervised Adversarial Image Inpainting
This work considers inpainting in an unsupervised setting where there is neither access to paired nor unpaired training data, and model the reconstruction process by using a conditional GAN with constraints on the stochastic component that introduce an explicit dependency between this component and the generated output.
Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders
A powerful new WaveNet-style autoencoder model is detailed that conditions an autoregressive decoder on temporal codes learned from the raw audio waveform, and NSynth, a large-scale and high-quality dataset of musical notes that is an order of magnitude larger than comparable public datasets is introduced.
A Rate-Distortion Framework for Explaining Neural Network Decisions
The widespread idea of interpreting neural network decisions as an explicit optimisation problem in a rate-distortion framework is formalised and a heuristic solution strategy based on assumed density filtering for deep ReLU neural networks is developed.
Explaining Image Classifiers by Counterfactual Generation
This work can sample plausible image in-fills by conditioning a generative model on the rest of the image, and optimize to find the image regions that most change the classifier's decision after in-fill.
U-Net: Convolutional Networks for Biomedical Image Segmentation
It is shown that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.
On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation
This work proposes a general solution to the problem of understanding classification decisions by pixel-wise decomposition of nonlinear classifiers by introducing a methodology that allows to visualize the contributions of single pixels to predictions for kernel-based classifiers over Bag of Words features and for multilayered neural networks.