• Corpus ID: 219530870

Evaluation Criteria for Instance-based Explanation

@article{Hanawa2020EvaluationCF,
  title={Evaluation Criteria for Instance-based Explanation},
  author={Kazuaki Hanawa and Sho Yokoi and Satoshi Hara and Kentaro Inui},
  journal={ArXiv},
  year={2020},
  volume={abs/2006.04528}
}
Explaining predictions made by complex machine learning models helps users understand and accept the predicted outputs with confidence. Instance-based explanation provides such help by identifying relevant instances as evidence to support a model's prediction result. To find relevant instances, several relevance metrics have been proposed. In this study, we ask the following research question: "Do the metrics actually work in practice?" To address this question, we propose two sanity check… 

On Sample Based Explanation Methods for NLP: Faithfulness, Efficiency and Semantic Evaluation

TLDR
This work can improve the interpretability of explanations by allowing arbitrary text sequences as the explanation unit and proposes a semantic-based evaluation metric that can better align with humans’ judgment of explanations than the widely adopted diagnostic or re-training measures.

On Sample Based Explanation Methods for NLP: Efficiency, Faithfulness, and Semantic Evaluation

TLDR
This work can improve the interpretability of explanations by allowing arbitrary text sequences as the explanation unit, and proposes a semantic-based evaluation metric that can better align with humans’ judgment of explanations than the widely adopted diagnostic or retraining measures.

Neural Random Projection: From the Initial Task To the Input Similarity Problem

TLDR
A novel approach for implicit data representation to evaluate similarity of input data using a trained neural network utilizing only the outputs from the last hidden layer of a neural network and do not use a backward step is proposed.

References

SHOWING 1-10 OF 46 REFERENCES

"Why Should I Trust You?": Explaining the Predictions of Any Classifier

TLDR
LIME is proposed, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning aninterpretable model locally varound the prediction.

A Unified Approach to Interpreting Model Predictions

TLDR
A unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations), which unifies six existing methods and presents new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.

Interpreting Black Box Predictions using Fisher Kernels

TLDR
This work takes a novel look at black box interpretation of test predictions in terms of training examples, making use of Fisher kernels as the defining feature embedding of each data point, combined with Sequential Bayesian Quadrature (SBQ) for efficient selection of examples.

Examples are not enough, learn to criticize! Criticism for Interpretability

TLDR
Motivated by the Bayesian model criticism framework, MMD-critic is developed, which efficiently learns prototypes and criticism, designed to aid human interpretability.

A Survey of Methods for Explaining Black Box Models

TLDR
A classification of the main problems addressed in the literature with respect to the notion of explanation and the type of black box system is provided to help the researcher to find the proposals more useful for his own work.

Prototype selection for interpretable classification

TLDR
This paper discusses a method for selecting prototypes in the classification setting (in which the samples fall into known discrete categories), and demonstrates the interpretative value of producing prototypes on the well-known USPS ZIP code digits data set and shows that as a classifier it performs reasonably well.

Input Similarity from the Neural Network Perspective

TLDR
The mathematical properties of this similarity measure are studied, and it is shown how to estimate sample density with it, in low complexity, enabling new types of statistical analysis for neural networks.

Interpretable Machine Learning

TLDR
This project introduces Robust T CAV, which builds on TCAV and experimentally determines best practices for this method and is a step in the direction of making TCAVs, an already impactful algorithm in interpretability, more reliable and useful for practitioners.

Data Cleansing for Models Trained with SGD

TLDR
This paper proposes an algorithm that can suggest influential instances without using any domain knowledge, and infers the influential instances by retracing the steps of the SGD while incorporating intermediate models computed in each step.