DANCE: Enhancing saliency maps using decoys
@inproceedings{Lu2021DANCEES, title={DANCE: Enhancing saliency maps using decoys}, author={Yang Young Lu and Wenbo Guo and Xinyu Xing and William Stafford Noble}, booktitle={ICML}, year={2021} }
Saliency methods can make deep neural network predictions more interpretable by identifying a set of critical features in an input sample, such as pixels that contribute most strongly to a prediction made by an image classifier. Unfortunately, recent evidence suggests that many saliency methods poorly perform, especially in situations where gradients are saturated, inputs contain adversarial perturbations, or predictions rely upon interfeature dependence. To address these issues, we propose a…
2 Citations
Improving Attribution Methods by Learning Submodular Functions
- Computer ScienceAISTATS
- 2022
This work explores the novel idea of learning a submodular scoring function to improve the specificity/selectivity of existing feature attribution methods and achieves higher specificity along with good discriminative power.
EDGE: Explaining Deep Reinforcement Learning Policies
- Computer ScienceNeurIPS
- 2021
A novel self-explainable model is proposed that augments a Gaussian process with a customized kernel function and an interpretable predictor and can predict an agent’s final rewards from its game episodes and extract time step importance within episodes as strategy-level explanations for that agent.
References
SHOWING 1-10 OF 61 REFERENCES
Sanity Checks for Saliency Maps
- Computer ScienceNeurIPS
- 2018
It is shown that some existing saliency methods are independent both of the model and of the data generating process, and methods that fail the proposed tests are inadequate for tasks that are sensitive to either data or model.
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
- Computer ScienceInternational Journal of Computer Vision
- 2019
This work proposes a technique for producing ‘visual explanations’ for decisions from a large class of Convolutional Neural Network (CNN)-based models, making them more transparent and explainable, and shows that even non-attention based models learn to localize discriminative regions of input image.
Intriguing properties of neural networks
- Computer ScienceICLR
- 2014
It is found that there is no distinction between individual highlevel units and random linear combinations of high level units, according to various methods of unit analysis, and it is suggested that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks.
This looks like that: deep learning for interpretable image recognition
- Computer ScienceNeurIPS
- 2019
A deep network architecture -- prototypical part network (ProtoPNet), that reasons in a similar way to the way ornithologists, physicians, and others would explain to people on how to solve challenging image classification tasks, that provides a level of interpretability that is absent in other interpretable deep models.
Interpretation of Neural Networks is Fragile
- Computer ScienceAAAI
- 2019
This paper systematically characterize the fragility of several widely-used feature-importance interpretation methods (saliency maps, relevance propagation, and DeepLIFT) on ImageNet and CIFAR-10 and extends these results to show that interpretations based on exemplars (e.g. influence functions) are similarly fragile.
Real Time Image Saliency for Black Box Classifiers
- Computer ScienceNIPS
- 2017
A masking model is trained to manipulate the scores of the classifier by masking salient parts of the input image to generalise well to unseen images and requires a single forward pass to perform saliency detection, therefore suitable for use in real-time systems.
Explaining Image Classifiers by Counterfactual Generation
- MathematicsICLR
- 2019
This work can sample plausible image in-fills by conditioning a generative model on the rest of the image, and optimize to find the image regions that most change the classifier's decision after in-fill.
Interpretable Explanations of Black Boxes by Meaningful Perturbation
- Computer Science2017 IEEE International Conference on Computer Vision (ICCV)
- 2017
A general framework for learning different kinds of explanations for any black box algorithm is proposed and the framework to find the part of an image most responsible for a classifier decision is specialised.
A Benchmark for Interpretability Methods in Deep Neural Networks
- Computer ScienceNeurIPS
- 2019
An empirical measure of the approximate accuracy of feature importance estimates in deep neural networks is proposed and it is shown that some approaches do no better then the underlying method but carry a far higher computational burden.
Adversarial Localization Network
- Computer Science
- 2017
The Adversarial Localization Network is proposed, a novel weakly supervised approach to generate object masks in an image that achieves competitive results on the ILSVRC2012 dataset with only image level labels and no bounding boxes for training.