Corpus ID: 203610196

Leveraging Model Interpretability and Stability to increase Model Robustness

  title={Leveraging Model Interpretability and Stability to increase Model Robustness},
  author={Fei Wu and Thomas Michel and Alexandre Briot},
State of the art Deep Neural Networks (DNN) can now achieve above human level accuracy on image classification tasks. However their outstanding performances come along with a complex inference mechanism making them arduously interpretable models. In order to understand the underlying prediction rules of DNNs, Dhamdhere et al. propose an interpretability method to break down a DNN prediction score as sum of its hidden unit contributions, in the form of a metric called conductance. Analyzing… Expand


A Unified Approach to Interpreting Model Predictions
A unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations), which unifies six existing methods and presents new methods that show improved computational performance and/or better consistency with human intuition than previous approaches. Expand
Explaining Explanations: An Approach to Evaluating Interpretability of Machine Learning
The definition of explainability is provided and how it can be used to classify existing literature is shown and discussed to create best practices and identify open challenges in explanatory artificial intelligence. Expand
Learning Important Features Through Propagating Activation Differences
DeepLIFT (Deep Learning Important FeaTures), a method for decomposing the output prediction of a neural network on a specific input by backpropagating the contributions of all neurons in the network to every feature of the input, is presented. Expand
Interpreting CNNs via Decision Trees
The proposed decision tree is a decision tree, which clarifies the specific reason for each prediction made by the CNN at the semantic level, and organizes all potential decision modes in a coarse-to-fine manner to explain CNN predictions at different fine-grained levels. Expand
Influence-Directed Explanations for Deep Convolutional Networks
Evaluation demonstrates that influence-directed explanations identify influential concepts that generalize across instances, can be used to extract the “essence” of what the network learned about a class, and isolate individual features the network uses to make decisions and distinguish related classes. Expand
How Important Is a Neuron?
The notion of conductance is introduced to extend the notion of attribution to the understanding of the importance of hidden units in a deep network, or over a set of inputs. Expand
Detecting Adversarial Samples from Artifacts
This paper investigates model confidence on adversarial samples by looking at Bayesian uncertainty estimates, available in dropout neural networks, and by performing density estimation in the subspace of deep features learned by the model, and results show a method for implicit adversarial detection that is oblivious to the attack algorithm. Expand
Going deeper with convolutions
We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual RecognitionExpand
Neural Network Interpretation via Fine Grained Textual Summarization
This paper introduces the novel task of interpreting classification models using fine grained textual summarization and faithfully reflects the features learned by the model using rigorous applications like attribute based image retrieval and unsupervised text grounding. Expand
Deep Residual Learning for Image Recognition
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. Expand