• Corpus ID: 235795402

Reliable Post hoc Explanations: Modeling Uncertainty in Explainability

  title={Reliable Post hoc Explanations: Modeling Uncertainty in Explainability},
  author={Dylan Slack and Sophie Hilgard and Sameer Singh and Himabindu Lakkaraju},
As black box explanations are increasingly being employed to establish model credibility in high stakes settings, it is important to ensure that these explanations are accurate and reliable. However, prior work demonstrates that explanations generated by state-of-the-art techniques are inconsistent, unstable, and provide very little insight into their correctness and reliability. In addition, these methods are also computationally inefficient, and require significant hyper-parameter tuning. In… 
More Than Words: Towards Better Quality Interpretations of Text Classifiers
Higher-level feature attributions offer several advantages and are more intelligible to humans in situations where the linguistic coherence resides at a higher granularity level, and token-based interpretability, while being a convenient first choice given the input interfaces of the ML models, is not the most effective one in all situations.
Uncertainty Quantification of Surrogate Explanations: an Ordinal Consensus Approach
This paper produces estimates of the uncertainty of a given explanation by measuring the ordinal consensus amongst a set of diverse bootstrapped surrogate explainers and proposes and analyse metrics to aggregate the information contained within the set of explainers through a rating scheme.
What will it take to generate fairness-preserving explanations?
It is suggested that explanations do not necessarily preserve the fairness properties of the black-box algorithm, and explanation algorithms can ignore or obscure critical relevant properties, creating incorrect or misleading explanations.


Mnist handwritten digit database
  • ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist,
  • 2010
ImageNet: A large-scale hierarchical image database
A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
A Unified Approach to Interpreting Model Predictions
A unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations), which unifies six existing methods and presents new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
LIME is proposed, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning aninterpretable model locally varound the prediction.
Towards the Unification and Robustness of Perturbation and Gradient Based Explanations
This work derives explicit closed form expressions for the explanations output by these two post hoc interpretation techniques: SmoothGrad and a variant of LIME, and derives finite sample complexity bounds for the number of perturbations required for these methods to converge to their expected explanation.
BayLIME: Bayesian Local Interpretable Model-Agnostic Explanations
A novel XAI technique, BayLIME, is introduced, which is a Bayesian modification of the widely used XAI approach LIME, which exploits prior knowledge to improve the consistency in repeated explanations of a single prediction and also the robustness to kernel settings.
Improving KernelSHAP: Practical Shapley Value Estimation via Linear Regression
A version of KernelSHAP for stochastic cooperative games that yields fast new estimators for two global explanation methods and a variance reduction technique that further accelerates the convergence of both estimators.
Manipulating and Measuring Model Interpretability
A sequence of pre-registered experiments showed participants functionally identical models that varied only in two factors commonly thought to make machine learning models more or less interpretable: the number of features and the transparency of the model (i.e., whether the model internals are clear or black box).
Probabilistic Sufficient Explanations
Probabilistic sufficient explanations are introduced, which formulate explaining an instance of classification as choosing the “simplest” subset of features such that only observing those features is “sufficient” to explain the classification.
"How do I fool you?": Manipulating User Trust via Misleading Black Box Explanations
This work proposes a novel theoretical framework for understanding and generating misleading explanations, and carries out a user study with domain experts to demonstrate how these explanations can be used to mislead users.