ELUDE: Generating interpretable explanations via a decomposition into labelled and unlabelled features

@article{Ramaswamy2022ELUDEGI,
  title={ELUDE: Generating interpretable explanations via a decomposition into labelled and unlabelled features},
  author={Vikram V. Ramaswamy and Sunnie Kim and Nicole Meister and Ruth C. Fong and Olga Russakovsky},
  journal={ArXiv},
  year={2022},
  volume={abs/2206.07690}
}
Deep learning models have achieved remarkable success in different areas of machine learning over the past decade; however, the size and complexity of these models make them difficult to understand. In an effort to make them more interpretable, several recent works focus on explaining parts of a deep neural network through human-interpretable, semantic attributes. However, it may be impossible to completely explain complex models using only semantic attributes. In this work, we propose to… 
1 Citations
Overlooked factors in concept-based explanations: Dataset choice, concept salience, and human capability
TLDR
It is argued that only visually salient concepts should be used in concept-based explanations, and a much stricter upper bound of 32 concepts or less is revealed, beyond which the explanations are much less practically useful.

References

SHOWING 1-10 OF 27 REFERENCES
Interpretable Basis Decomposition for Visual Explanation
TLDR
A new framework called Interpretable Basis Decomposition for providing visual explanations for classification networks is proposed, decomposing the neural activations of the input image into semantically interpretable components pre-trained from a large concept corpus.
Contextual Semantic Interpretability
TLDR
A two-layer semantic bottleneck that gathers attributes into interpretable, sparse groups, allowing them contribute differently to the final output depending on the context, and training the semantic interpretable bottleneck using an auxiliary database.
Net2Vec: Quantifying and Explaining How Concepts are Encoded by Filters in Deep Neural Networks
  • Ruth Fong, A. Vedaldi
  • Computer Science
    2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2018
TLDR
The Net2Vec framework is introduced, in which semantic concepts are mapped to vectorial embeddings based on corresponding filter responses, and it is shown that in most cases, multiple filters are required to code for a concept, and that filterembeddings are able to better characterize the meaning of a representation and its relationship to other concepts.
This looks like that: deep learning for interpretable image recognition
TLDR
A deep network architecture -- prototypical part network (ProtoPNet), that reasons in a similar way to the way ornithologists, physicians, and others would explain to people on how to solve challenging image classification tasks, that provides a level of interpretability that is absent in other interpretable deep models.
Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)
TLDR
Concept Activation Vectors (CAVs) are introduced, which provide an interpretation of a neural net's internal state in terms of human-friendly concepts, and may be used to explore hypotheses and generate insights for a standard image classification network as well as a medical application.
Neural Prototype Trees for Interpretable Fine-grained Image Recognition
TLDR
The Neural Prototype Tree (ProtoTree), an intrinsically interpretable deep learning method for fine-grained image recognition that combines prototype learning with decision trees, and thus results in a globally interpretable model by design.
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
TLDR
LIME is proposed, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning aninterpretable model locally varound the prediction.
Network Dissection: Quantifying Interpretability of Deep Visual Representations
TLDR
This work uses the proposed Network Dissection method to test the hypothesis that interpretability is an axis-independent property of the representation space, then applies the method to compare the latent representations of various networks when trained to solve different classification problems.
Understanding Deep Networks via Extremal Perturbations and Smooth Masks
TLDR
Some of the shortcomings of existing approaches to perturbation analysis are discussed and the concept of extremal perturbations are introduced, which are theoretically grounded and interpretable and allow us to remove all tunable weighing factors from the optimization problem.
RISE: Randomized Input Sampling for Explanation of Black-box Models
TLDR
The problem of Explainable AI for deep neural networks that take images as input and output a class probability is addressed and an approach called RISE that generates an importance map indicating how salient each pixel is for the model's prediction is proposed.
...
...