• Corpus ID: 235253892

The effectiveness of feature attribution methods and its correlation with automatic evaluation scores

  title={The effectiveness of feature attribution methods and its correlation with automatic evaluation scores},
  author={Giang Nguyen and Daeyoung Kim and Anh M Nguyen},
Explaining the decisions of an Artificial Intelligence (AI) model is increasingly critical in many real-world, high-stake applications. Hundreds of papers have either proposed new feature attribution methods, discussed or harnessed these tools in their work. However, despite humans being the target end-users, most attribution methods were only evaluated on proxy automatic-evaluation metrics [60, 78, 80]. In this paper, we conduct the first user study to measure attribution map effectiveness in… 

Visual correspondence-based explanations improve AI robustness and human-AI team accuracy

This work proposes two novel architectures of self-interpretable image classifiers that first explain, and then predict by harnessing the visual correspondences between a query image and exemplars, and shows that it is possible to achieve complementary human-AI team accuracy higher than either AI-alone or human-alone, in ImageNet and CUB image classification tasks.

What I Cannot Predict, I Do Not Understand: A Human-Centered Evaluation Framework for Explainability Methods

It is demonstrated that theoretical measures used to score explainability methods poorly reflect the practical usefulness of individual attribution methods in real-world scenarios, suggesting a critical need to develop better explainable methods and to deploy human-centered evaluation approaches.

On the Effect of Information Asymmetry in Human-AI Teams

It is demonstrated that humans can use contextual information to adjust the AI’s decision, resulting in complementary team performance (CTP), as in many real-world situations, humans have access to different contextual information.

Explaining Latent Representations with a Corpus of Examples

SimplEx is a user-centred method that provides example-based explanations with reference to a freely selected set of examples, called the corpus that improves the user’s understanding of the latent space with post-hoc explanations.

This paper belongs to a line of work on assessing the effectiveness of post hoc explanations meth

  • Computer Science
  • 2021
In a broad experimental sweep across datasets, models, and spurious signals, it is found that the post hoc explanations tested can be used to identify a model’s reliance on a visible spurious signal provided the spurious signal is known ahead of time by the practitioner using the explanation method.


It is found that the post hoc explanation methods tested are ineffective when the spurious artifact is unknown at test-time especially for non-visible artefacts like a background blur.

Interpretable deep learning models for better clinician-AI communication in clinical mammography

An interpretable deep-learning network which explains its predictions in terms of BI-RADS features mass shape and mass margin, then uses the logits from those interpretable models to predict malignancy, also using an interpretable model.

HIVE: Evaluating the Human Interpretability of Visual Explanations

HIVE (Human Interpretability of Visual Explanations), a novel human evaluation framework that assesses the utility of explanations to human users in AI-assisted decision making scenarios, and enables falsifiable hypothesis testing, cross-method comparison, and human-centered evaluation of visual interpretability methods is introduced.

A Meta-Analysis of the Utility of Explainable Artificial Intelligence in Human-AI Decision-Making

An initial synthesis of existing research on XAI studies using a statistical meta-analysis to derive implications across existing research finds a statistically positive impact of XAI on users' performance, and indicates that human-AI decision-making tends to yield better task performance on text data.



SAM: The Sensitivity of Attribution Methods to Hyperparameters

Attribution methods can provide powerful insights into the reasons for a classifier’s decision. We argue that a key desideratum of an explanation method is its robustness to input hyperparameters

Challenging common interpretability assumptions in feature attribution explanations

It is found that feature attribution explanations provide marginal utility in the authors' task for a human decision maker and in certain cases result in worse decisions due to cognitive and contextual confounders.

Interpretation of Neural Networks is Fragile

This paper systematically characterize the fragility of several widely-used feature-importance interpretation methods (saliency maps, relevance propagation, and DeepLIFT) on ImageNet and CIFAR-10 and extends these results to show that interpretations based on exemplars (e.g. influence functions) are similarly fragile.

Natural Adversarial Examples

This work introduces two challenging datasets that reliably cause machine learning model performance to substantially degrade and curates an adversarial out-of-distribution detection dataset called IMAGENET-O, which is the first out- of-dist distribution detection dataset created for ImageNet models.

On Human Predictions with Explanations and Predictions of Machine Learning Models: A Case Study on Deception Detection

This paper uses deception detection as a testbed and investigates how to harness explanations and predictions of machine learning models to improve human performance while retaining human agency, and demonstrates a tradeoff between human performance and human agency.

Explainable AI for Natural Adversarial Images

It is found that both Saliency maps and examples facilitate catching AI errors, but their effects are not additive, and saliency maps are more effective than examples.

"Why is 'Chicago' deceptive?" Towards Building Model-Driven Tutorials for Humans

It is found that tutorials indeed improve human performance, with and without real-time assistance, and although deep learning provides superior predictive performance than simple models, tutorials and explanations from simple models are more useful to humans.

How Useful Are the Machine-Generated Interpretations to General Users? A Human Evaluation on Guessing the Incorrectly Predicted Labels

An investigation on whether or not showing machine-generated visual interpretations helps users understand the incorrectly predicted labels produced by image classifiers demonstrates that displaying the visual interpretations did not increase, but rather decreased, the average guessing accuracy by roughly 10%.

Quantifying Interpretability and Trust in Machine Learning Systems

A trust metric is derived that identifies when human decisions are overly biased towards ML predictions and demonstrates the value of interpretability for ML assisted human decision making.

A Unified Approach to Interpreting Model Predictions

A unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations), which unifies six existing methods and presents new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.