Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization

@article{Selvaraju2016GradCAMVE,
  title={Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization},
  author={Ramprasaath R. Selvaraju and Abhishek Das and Ramakrishna Vedantam and Michael Cogswell and Devi Parikh and Dhruv Batra},
  journal={International Journal of Computer Vision},
  year={2016},
  volume={128},
  pages={336-359}
}
We propose a technique for producing ‘visual explanations’ for decisions from a large class of Convolutional Neural Network (CNN)-based models, making them more transparent and explainable. [] Key Method Unlike previous approaches, Grad-CAM is applicable to a wide variety of CNN model-families: (1) CNNs with fully-connected layers (e.g.

Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks.

This paper proposes a generalized method called Grad-CAM++ that can provide better visual explanations of CNN model predictions, in terms of better object localization as well as explaining occurrences of multiple object instances in a single image, when compared to state-of-the-art.

Ablation-CAM: Visual Explanations for Deep Convolutional Network via Gradient-free Localization

This approach – Ablation-based Class Activation Mapping (Ablation CAM) uses ablation analysis to determine the importance of individual feature map units w.r.t. class to produce a coarse localization map highlighting the important regions in the image for predicting the concept.

Visual Explanations from Deep Networks via Riemann-Stieltjes Integrated Gradient-based Localization

This work introduces a new technique to produce visual explanations for the predictions of a CNN that can be applied to any layer of the network, and is not affected by the problem of vanishing gradients.

Smooth Grad-CAM++: An Enhanced Inference Level Visualization Technique for Deep Convolutional Neural Network Models

The Smooth Grad-CAM++ technique provides the capability of either visualizing a layer, subset of feature maps, or subset of neurons within a feature map at each instance at the inference level (model prediction process).

Adapting Grad-CAM for Embedding Networks

This work proposes an adaptation of the Grad-CAM method for embedding networks, and develops an efficient weight-transfer method to explain decisions for any image without back-propagation.

Eigen-CAM: Visual Explanations for Deep Convolutional Neural Networks

Novel Eigen-CAM is presented to enhance explanations of CNN predictions by visualizing principal components of learned representations from convolutional layers that are more consistent, class discriminative, and robust against classification errors made by dense layers.

MACE: Model Agnostic Concept Extractor for Explaining Image Classification Networks

The MACE framework dissects the feature maps generated by a convolution network for an image to extract concept-based prototypical explanations and estimates the relevance of the extracted concepts to the pretrained model’s predictions, a critical aspect for explaining the individual class predictions, missing in existing approaches.

Abs-CAM: A Gradient Optimization Interpretable Approach for Explanation of Convolutional Neural Networks

An Absolute value Class Activation Mapping-based (Abs-CAM) method, which optimizes the gradients derived from the backpropagation and turns all of them into positive gradients to enhance the visual features of output neurons’ activation, and improve the localization ability of the saliency map.

Group-CAM: Group Score-Weighted Visual Explanations for Deep Convolutional Networks

This paper proposes an efficient saliency map generation method, called Group score-weighted Class Activation Mapping (GroupCAM), which adopts the “split-transform-merge” strategy to generate saliency maps.

Review of white box methods for explanations of convolutional neural networks in image classification tasks

This work aims to provide a comprehensive and detailed overview of a set of methods that can be used to create explanation maps for a particular image, which assign an importance score to each pixel of the image based on its contribution to the decision of the network.
...

References

SHOWING 1-10 OF 76 REFERENCES

Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization

It is shown that Guided Grad-CAM helps untrained users successfully discern a "stronger" deep network from a "weaker" one even when both networks make identical predictions, and also exposes the somewhat surprising insight that common CNN + LSTM models can be good at localizing discriminative input image regions despite not being trained on grounded image-text pairs.

Fully convolutional networks for semantic segmentation

The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.

Network Dissection: Quantifying Interpretability of Deep Visual Representations

This work uses the proposed Network Dissection method to test the hypothesis that interpretability is an axis-independent property of the representation space, then applies the method to compare the latent representations of various networks when trained to solve different classification problems.

Choose Your Neuron: Incorporating Domain Knowledge through Neuron-Importance

This work learns to map domain knowledge about novel “unseen” classes onto this dictionary of learned concepts and optimizes for network parameters that can effectively combine these concepts – essentially learning classifiers by discovering and composing learned semantic concepts in deep networks.

Visualizing Deep Convolutional Neural Networks Using Natural Pre-images

This paper studies several landmark representations, both shallow and deep, by a number of complementary visualization techniques based on the concept of “natural pre-image”, and shows that several layers in CNNs retain photographically accurate information about the image, with different degrees of geometric and photometric invariance.

Striving for Simplicity: The All Convolutional Net

It is found that max-pooling can simply be replaced by a convolutional layer with increased stride without loss in accuracy on several image recognition benchmarks.

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

This paper addresses the visualisation of image classification models, learnt using deep Convolutional Networks (ConvNets), and establishes the connection between the gradient-based ConvNet visualisation methods and deconvolutional networks.

Learning Deep Features for Discriminative Localization

In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network (CNN) to have remarkable localization ability

Self-taught object localization with deep networks

This paper introduces self-taught object localization, a novel approach that leverages deep convolutional networks trained for whole-image recognition to localize objects in images without additional

Hierarchical Question-Image Co-Attention for Visual Question Answering

This paper presents a novel co-attention model for VQA that jointly reasons about image and question attention in a hierarchical fashion via a novel 1-dimensional convolution neural networks (CNN).
...