Corpus ID: 12993310

Grad-CAM: Why did you say that?

  title={Grad-CAM: Why did you say that?},
  author={Ramprasaath R. Selvaraju and Abhishek Das and Ramakrishna Vedantam and Michael Cogswell and Devi Parikh and Dhruv Batra},
We propose a technique for making Convolutional Neural Network (CNN)-based models more transparent by visualizing input regions that are 'important' for predictions -- or visual explanations. Our approach, called Gradient-weighted Class Activation Mapping (Grad-CAM), uses class-specific gradient information to localize important regions. These localizations are combined with existing pixel-space visualizations to create a novel high-resolution and class-discriminative visualization called… Expand
Bi-gradient Verification for Grad-CAM Towards Accurate Visual Explanation for Remote Sensing Images
A new strategy, bidirectional gradient verification (BiGradV), is proposed, based on the fact both positive and negative gradients can be sensitive to class discrimination of remote sensing images, to rectify the visual explanation produced by Grad-CAM. Expand
Sanity Checks for Saliency Maps
It is shown that some existing saliency methods are independent both of the model and of the data generating process, and methods that fail the proposed tests are inadequate for tasks that are sensitive to either data or model. Expand
Towards Learning Spatially Discriminative Feature Representations
  • Chaofei Wang, Jiayu Xiao, Yizeng Han, Qisen Yang, Shiji Song, Gao Huang
  • Computer Science
  • 2021
The backbone of traditional CNN classifier is generally considered as a feature extractor, followed by a linear layer which performs the classification. We propose a novel loss function, termed asExpand
Towards Automated Concept-based Decision TreeExplanations for CNNs
This work proposes Automated Concept-based Decision Tree Explanations (ACDTE), a novel local explanation framework that provides human-understandable and concept-based explanations for classification networks and demonstrates that such a shallow decision tree is faithful to the original neural network at low tree depth. Expand
Explaining Image Classifiers by Adaptive Dropout and Generative In-filling
This work marginalize out masked regions of the input, conditioning a generative model on the rest of the image, and produces realistic explanations, generating plausible inputs that would have caused the model to classify differently. Expand
Grid Saliency for Context Explanations of Semantic Segmentation
The results show that grid saliency can be successfully used to provide easily interpretable context explanations and, moreover, can be employed for detecting and localizing contextual biases present in the data. Expand
Explaining Image Classifiers by Counterfactual Generation
This work can sample plausible image in-fills by conditioning a generative model on the rest of the image, and optimize to find the image regions that most change the classifier's decision after in-fill. Expand
Robust Decoy-enhanced Saliency Maps.
Experimental results suggest that the aggregated saliency map could not only capture inter-feature dependence but robustify interpretation against previously described adversarial perturbation methods, and qualitatively and quantitatively outperforms existing methods. Expand
Using KL-divergence to focus Deep Visual Explanation
A method for explaining the image classification predictions of deep convolution neural networks, by highlighting the pixels in the image which influence the final class prediction, using Kullback-Leibler divergence to provide this focus. Expand
A study of interpretability mechanisms for deep networks
This thesis proposes a framework to test the interpretation algorithms under model perturbation and data perturbations and introduces a new interpretability technique called “Forward-Backward Interpretability algorithm” that provides a systematic framework for visualizing information flow in deep networks. Expand


Visualizing and Understanding Convolutional Networks
A novel visualization technique is introduced that gives insight into the function of intermediate feature layers and the operation of the classifier in large Convolutional Network models, used in a diagnostic role to find model architectures that outperform Krizhevsky et al on the ImageNet classification benchmark. Expand
Hierarchical Question-Image Co-Attention for Visual Question Answering
This paper presents a novel co-attention model for VQA that jointly reasons about image and question attention in a hierarchical fashion via a novel 1-dimensional convolution neural networks (CNN). Expand
Learning Deep Features for Discriminative Localization
In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network (CNN) to have remarkable localization abilityExpand
Striving for Simplicity: The All Convolutional Net
It is found that max-pooling can simply be replaced by a convolutional layer with increased stride without loss in accuracy on several image recognition benchmarks. Expand
Deep Visual-Semantic Alignments for Generating Image Descriptions
  • A. Karpathy, Li Fei-Fei
  • Computer Science, Medicine
  • IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2017
A model that generates natural language descriptions of images and their regions based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding is presented. Expand
DenseCap: Fully Convolutional Localization Networks for Dense Captioning
A Fully Convolutional Localization Network (FCLN) architecture is proposed that processes an image with a single, efficient forward pass, requires no external regions proposals, and can be trained end-to-end with asingle round of optimization. Expand
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
LIME is proposed, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning aninterpretable model locally varound the prediction. Expand
HOGgles: Visualizing Object Detection Features
Algorithms to visualize feature spaces used by object detectors allow a human to put on 'HOG goggles' and perceive the visual world as a HOG based object detector sees it, and allow us to analyze object detection systems in new ways and gain new insight into the detector's failures. Expand
ImageNet: A large-scale hierarchical image database
A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets. Expand
CloudCV: Large-Scale Distributed Computer Vision as a Cloud Service
The goal is to democratize computer vision; one should not have to be a computer vision, big data and distributed computing expert to have access to state-of-the-art distributed computer vision algorithms. Expand