• Publications
  • Influence
SmoothGrad: removing noise by adding noise
TLDR
SmoothGrad is introduced, a simple method that can help visually sharpen gradient-based sensitivity maps and lessons in the visualization of these maps are discussed.
Towards A Rigorous Science of Interpretable Machine Learning
TLDR
This position paper defines interpretability and describes when interpretability is needed (and when it is not), and suggests a taxonomy for rigorous evaluation and exposes open questions towards a more rigorous science of interpretable machine learning.
Sanity Checks for Saliency Maps
TLDR
It is shown that some existing saliency methods are independent both of the model and of the data generating process, and methods that fail the proposed tests are inadequate for tasks that are sensitive to either data or model.
Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)
TLDR
Concept Activation Vectors (CAVs) are introduced, which provide an interpretation of a neural net's internal state in terms of human-friendly concepts, and may be used to explore hypotheses and generate insights for a standard image classification network as well as a medical application.
A Benchmark for Interpretability Methods in Deep Neural Networks
TLDR
An empirical measure of the approximate accuracy of feature importance estimates in deep neural networks is proposed and it is shown that some approaches do no better then the underlying method but carry a far higher computational burden.
Examples are not enough, learn to criticize! Criticism for Interpretability
TLDR
Motivated by the Bayesian model criticism framework, MMD-critic is developed, which efficiently learns prototypes and criticism, designed to aid human interpretability.
Towards Automatic Concept-based Explanations
TLDR
This work proposes principles and desiderata for concept based explanation, which goes beyond per-sample features to identify higher-level human-understandable concepts that apply across the entire dataset.
Visualizing and Measuring the Geometry of BERT
TLDR
This paper describes qualitative and quantitative investigations of one particularly effective model, BERT, and finds evidence of a fine-grained geometric representation of word senses in both attention matrices and individual word embeddings.
Concept Bottleneck Models
TLDR
On x-ray grading and bird identification, concept bottleneck models achieve competitive accuracy with standard end-to-end models, while enabling interpretation in terms of high-level clinical concepts (“bone spurs”) or bird attributes ( “wing color”).
Learning how to explain neural networks: PatternNet and PatternAttribution
TLDR
This work argues that explanation methods for neural nets should work reliably in the limit of simplicity, the linear models, and proposes a generalization that yields two explanation techniques (PatternNet and PatternAttribution) that are theoretically sound for linear models and produce improved explanations for deep networks.
...
...