• Corpus ID: 240354648

A Survey on the Robustness of Feature Importance and Counterfactual Explanations

@article{Mishra2021ASO,
  title={A Survey on the Robustness of Feature Importance and Counterfactual Explanations},
  author={Saumitra Mishra and Sanghamitra Dutta and Jason Long and Daniele Magazzeni},
  journal={ArXiv},
  year={2021},
  volume={abs/2111.00358}
}
There exist several methods that aim to address the crucial task of understanding the behaviour of AI/MLmodels. Arguably, the most popular among them are local explanations that focus on investigating model behaviour for individual instances. Several methods have been proposed for local analysis, but relatively lesser effort has gone into understanding if the explanations are robust and accurately reflect the behaviour of underlying models. In this work, we present a survey of the works that… 

Tables from this paper

Robust Counterfactual Explanations for Tree-Based Ensembles

TLDR
The results demonstrate that the proposed strategy RobX generates counterfactuals that are significantly more robust (nearly 100% validity after actual model changes) and also realistic over existing state-of-the-art methods.

A survey of algorithmic recourse:contrastive explanations and consequential recommendations

TLDR
This work focuses on algorithmic recourse, which is concerned with providing explanations and recommendations to individuals who are unfavorably treated by automated decision-making systems, and performs an extensive literature review.

Robustness of Explanation Methods for NLP Models

TLDR
This is the first attempt to evaluate the adversarial robustness of an explanation method in the context of text modality and initial insights and results are provided towards devising a successful adversarial attack against text explanations.

RESHAPE: Explaining Accounting Anomalies in Financial Statement Audits by enhancing SHapley Additive exPlanations

TLDR
This work proposes Reconstruction Error SHapley Additive exPlanations Extension (RESHAPE), which explains the model output on an aggregated attribute-level, and introduces an evaluation framework to compare the versatility of XAI methods in auditing.

Let Users Decide: Navigating the Trade-offs between Costs and Robustness in Algorithmic Recourse

TLDR
A novel framework is proposed, EXPECTing noisy responses ( EXPECT), which addresses the problem of recourse invalidation in the face of noisy human responses by explicitly minimizing the probability of recoursevalidation in a noisy world.

Algorithmic Recourse in the Face of Noisy Human Responses

TLDR
This work theoretically and empirically analyzes the behavior of state-of-the-art algorithms, and demonstrates that the recourses generated by these algorithms are very likely to be invalidated if small changes are made to them, and proposes a novel framework, EXPECTing noisy responses (EXPECT), which addresses the problem of recourse invalidation in the face of noisy responses.

Framework for Testing Robustness of Machine Learning-Based Classifiers

TLDR
This paper proposes a framework to evaluate the already-developed classifiers with regard to their robustness by focusing on the variability of the classifier’s performance and changes in the classifiers’ parameter values using factor analysis and Monte Carlo simulations.

Manipulating SHAP via Adversarial Data Perturbations (Student Abstract)

TLDR
A model-agnostic algorithm for manipulating SHapley Additive exPlanations (SHAP) with perturbation of tabular data is introduced to support checking the stability of the explanations used by various stakeholders apparent in the domain of responsible AI.

Robust Bayesian Recourse

TLDR
The Bayesian recourse is introduced, a model-agnostic recourse that minimizes the posterior probability odds ratio and its min-max robust counterpart is presented with the goal of hedging against future changes in the machine learning model parameters.

Explainable AI for clinical and remote health applications: a survey on tabular and time series data

TLDR
Clinical validation, consistency assessment, objective and standardised quality evaluation, and human-centered quality assessment are identified as key features to ensure effective explanations for the end users in the healthcare domain.

References

SHOWING 1-10 OF 40 REFERENCES

Multi-Objective Counterfactual Explanations

TLDR
The Multi-Objective Counterfactuals (MOC) method is proposed, which translates the counterfactual search into a multi-objective optimization problem and returns a diverse set of counterfactUALs with different trade-offs between the proposed objectives, but also maintains diversity in feature space.

CARLA: A Python Library to Benchmark Algorithmic Recourse and Counterfactual Explanation Algorithms

TLDR
CARLA (Counterfactual And Recourse LibrAry), a python library for benchmarking counterfactual explanation methods across both different data sets and different machine learning models, and a standardized set of integrated evaluation measures and data sets for transparent and extensive comparisons.

Robustness in machine learning explanations: does it matter?

TLDR
It is argued that robustness is desirable to the extent that the authors're concerned about finding real patterns in the world and can also determine whether the Rashomon Effect is a boon or a bane.

Explaining Explanations: An Overview of Interpretability of Machine Learning

There has recently been a surge of work in explanatory artificial intelligence (XAI). This research area tackles the important problem that complex machines and algorithms often cannot provide

Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR

TLDR
It is suggested data controllers should offer a particular type of explanation, unconditional counterfactual explanations, to support these three aims, which describe the smallest change to the world that can be made to obtain a desirable outcome, or to arrive at the closest possible world, without needing to explain the internal logic of the system.

SAM: The Sensitivity of Attribution Methods to Hyperparameters

Attribution methods can provide powerful insights into the reasons for a classifier’s decision. We argue that a key desideratum of an explanation method is its robustness to input hyperparameters

Learning Model-Agnostic Counterfactual Explanations for Tabular Data

TLDR
A framework, called C-CHVAE, is developed, drawing ideas from the manifold learning literature, that generates faithful counterfactuals and is suggested to complement the catalog ofcounterfactual quality measures using a criterion to quantify the degree of difficulty for a certainCounterfactual suggestion.

Counterfactual Explanations Can Be Manipulated

TLDR
This work introduces the first framework that describes the vulnerabilities of counterfactual explanations and shows how they can be manipulated, and introduces a novel objective to train seemingly fair models where counterfactUAL explanations find much lower cost recourse under a slight perturbation.

"Why Should I Trust You?": Explaining the Predictions of Any Classifier

TLDR
LIME is proposed, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning aninterpretable model locally varound the prediction.

Why should you trust my interpretation? Understanding uncertainty in LIME predictions

TLDR
This work demonstrates the presence of two sources of uncertainty, namely the randomness in its sampling procedure and the variation of interpretation quality across different input data points in the method “Local Interpretable Model-agnostic Explanations" (LIME).