Crowdsourcing Evaluation of Saliency-based XAI Methods

  title={Crowdsourcing Evaluation of Saliency-based XAI Methods},
  author={Xiaotian Lu and Arseny Tolmachev and Tatsuya Yamamoto and Koh Takeuchi and Seiji Okajima and Tomoyoshi Takebayashi and Koji Maruhashi and Hisashi Kashima},
Understanding the reasons behind the predictions made by deep neural networks is critical for gaining human trust in many important applications, which is reflected in the increasing demand for explainability in AI (XAI) in recent years. Saliency-based feature attribution methods, which highlight important parts of images that contribute to decisions by classifiers, are often used as XAI methods, especially in the field of computer vision. In order to compare various saliency-based XAI methods… 
EXP-Crowd: A Gamified Crowdsourcing Framework for Explainability
This research frames the explainability problem from the crowds point of view and engages both users and AI researchers through a gamified crowdsourcing framework named EXP-Crowd to improve the crowds understanding of black-box models and the quality of the crowdsourced content.
The Role of Human Knowledge in Explainable AI
This article aims to present a literature overview on collecting and employing human knowledge to improve and evaluate the understandability of machine learning models through human-in-the-loop approaches.


Sanity Checks for Saliency Maps
It is shown that some existing saliency methods are independent both of the model and of the data generating process, and methods that fail the proposed tests are inadequate for tasks that are sensitive to either data or model.
Ambiance in Social Media Venues: Visual Cue Interpretation by Machines and Crowds
The results show that paintings, photos, and decorative items are strong cues for artsy ambiance, whereas type of utensils, type of lamps and presence of flowers may indicate formal ambiances, and the crowd-based assessment approach may motivate other studies on subjective perception of place attributes.
Evaluating the Visualization of What a Deep Neural Network Has Learned
A general methodology based on region perturbation for evaluating ordered collections of pixels such as heatmaps and shows that the recently proposed layer-wise relevance propagation algorithm qualitatively and quantitatively provides a better explanation of what made a DNN arrive at a particular classification decision than the sensitivity-based approach or the deconvolution method.
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
This work proposes a technique for producing ‘visual explanations’ for decisions from a large class of Convolutional Neural Network (CNN)-based models, making them more transparent and explainable, and shows that even non-attention based models learn to localize discriminative regions of input image.
3D Convolutional Neural Networks for Human Action Recognition
A novel 3D CNN model for action recognition that extracts features from both the spatial and the temporal dimensions by performing 3D convolutions, thereby capturing the motion information encoded in multiple adjacent frames.
Learning Deep Features for Discriminative Localization
In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network (CNN) to have remarkable localization ability
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
LIME is proposed, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning aninterpretable model locally varound the prediction.
Striving for Simplicity: The All Convolutional Net
It is found that max-pooling can simply be replaced by a convolutional layer with increased stride without loss in accuracy on several image recognition benchmarks.
Visualizing Higher-Layer Features of a Deep Network
This paper contrast and compare several techniques applied on Stacked Denoising Autoencoders and Deep Belief Networks, trained on several vision datasets, and shows that good qualitative interpretations of high level features represented by such models are possible at the unit level.
A Benchmark for Interpretability Methods in Deep Neural Networks
An empirical measure of the approximate accuracy of feature importance estimates in deep neural networks is proposed and it is shown that some approaches do no better then the underlying method but carry a far higher computational burden.