• Corpus ID: 235421605

DANCE: Enhancing saliency maps using decoys

  title={DANCE: Enhancing saliency maps using decoys},
  author={Yang Young Lu and Wenbo Guo and Xinyu Xing and William Stafford Noble},
Saliency methods can make deep neural network predictions more interpretable by identifying a set of critical features in an input sample, such as pixels that contribute most strongly to a prediction made by an image classifier. Unfortunately, recent evidence suggests that many saliency methods poorly perform, especially in situations where gradients are saturated, inputs contain adversarial perturbations, or predictions rely upon interfeature dependence. To address these issues, we propose a… 

Figures from this paper

Improving Attribution Methods by Learning Submodular Functions
This work explores the novel idea of learning a submodular scoring function to improve the specificity/selectivity of existing feature attribution methods and achieves higher specificity along with good discriminative power.
EDGE: Explaining Deep Reinforcement Learning Policies
A novel self-explainable model is proposed that augments a Gaussian process with a customized kernel function and an interpretable predictor and can predict an agent’s final rewards from its game episodes and extract time step importance within episodes as strategy-level explanations for that agent.


Sanity Checks for Saliency Maps
It is shown that some existing saliency methods are independent both of the model and of the data generating process, and methods that fail the proposed tests are inadequate for tasks that are sensitive to either data or model.
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
This work proposes a technique for producing ‘visual explanations’ for decisions from a large class of Convolutional Neural Network (CNN)-based models, making them more transparent and explainable, and shows that even non-attention based models learn to localize discriminative regions of input image.
Intriguing properties of neural networks
It is found that there is no distinction between individual highlevel units and random linear combinations of high level units, according to various methods of unit analysis, and it is suggested that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks.
This looks like that: deep learning for interpretable image recognition
A deep network architecture -- prototypical part network (ProtoPNet), that reasons in a similar way to the way ornithologists, physicians, and others would explain to people on how to solve challenging image classification tasks, that provides a level of interpretability that is absent in other interpretable deep models.
Interpretation of Neural Networks is Fragile
This paper systematically characterize the fragility of several widely-used feature-importance interpretation methods (saliency maps, relevance propagation, and DeepLIFT) on ImageNet and CIFAR-10 and extends these results to show that interpretations based on exemplars (e.g. influence functions) are similarly fragile.
Real Time Image Saliency for Black Box Classifiers
A masking model is trained to manipulate the scores of the classifier by masking salient parts of the input image to generalise well to unseen images and requires a single forward pass to perform saliency detection, therefore suitable for use in real-time systems.
Explaining Image Classifiers by Counterfactual Generation
This work can sample plausible image in-fills by conditioning a generative model on the rest of the image, and optimize to find the image regions that most change the classifier's decision after in-fill.
Interpretable Explanations of Black Boxes by Meaningful Perturbation
A general framework for learning different kinds of explanations for any black box algorithm is proposed and the framework to find the part of an image most responsible for a classifier decision is specialised.
A Benchmark for Interpretability Methods in Deep Neural Networks
An empirical measure of the approximate accuracy of feature importance estimates in deep neural networks is proposed and it is shown that some approaches do no better then the underlying method but carry a far higher computational burden.
Adversarial Localization Network
The Adversarial Localization Network is proposed, a novel weakly supervised approach to generate object masks in an image that achieves competitive results on the ILSVRC2012 dataset with only image level labels and no bounding boxes for training.