• Corpus ID: 3976789

Distilling a Neural Network Into a Soft Decision Tree

@article{Frosst2017DistillingAN,
  title={Distilling a Neural Network Into a Soft Decision Tree},
  author={Nicholas Frosst and Geoffrey E. Hinton},
  journal={ArXiv},
  year={2017},
  volume={abs/1711.09784}
}
Deep neural networks have proved to be a very effective way to perform classification tasks. They excel when the input data is high dimensional, the relationship between the input and the output is complicated, and the number of labeled training examples is large. But it is hard to explain why a learned network makes a particular classification decision on a particular test case. This is due to their reliance on distributed hierarchical representations. If we could take the knowledge acquired… 

Figures from this paper

A Set Membership Approach to Discovering Feature Relevance and Explaining Neural Classifier Decisions
TLDR
The proposed methodology builds on sound mathematical approaches and the results obtained constitute a reliable estimation of the classifier’s decision premisses, thus obtaining an explanation on its decision.
Learning Decision Trees Recurrently Through Communication
TLDR
This model generates human interpretable binary decision sequences explaining the predictions of the network while maintaining state-of-the-art accuracy on three benchmark image classification datasets, including the large-scale ImageNet.
Adaptive Neural Trees
TLDR
Adapt neural trees via adaptive neural trees (ANTs) that incorporates representation learning into edges, routing functions and leaf nodes of a decision tree, along with a backpropagation-based training algorithm that adaptively grows the architecture from primitive modules (e.g., convolutional layers).
Layerwise Knowledge Extraction from Deep Convolutional Networks
TLDR
A novel layerwise knowledge extraction method using M-of-N rules which seeks to obtain the best trade-off between the complexity and accuracy of rules describing the hidden features of a deep network and it is shown empirically that this approach produces rules close to an optimal complexity-error tradeoff.
Distilling a Deep Neural Network into a Takagi-Sugeno-Kang Fuzzy Inference System
TLDR
Knowledge distillation (KD) is applied to create a TSK-type FIS that generalizes better than one directly from the training data, which is guaranteed through experiments in this paper.
A Survey on the Explainability of Supervised Machine Learning
TLDR
This survey paper provides essential definitions, an overview of the different principles and methodologies of explainable Supervised Machine Learning, and a state-of-the-art survey that reviews past and recent explainable SML approaches and classifies them according to the introduced definitions.
Interpreting Deep Neural Networks Through Backpropagation
TLDR
This thesis explores a generic method for creating explanations for the decisions of any neural network using backpropagation, an internal algorithm common across all Neural Network architectures, to understand the correlation between the input to a network and the network’s output.
Near-Optimal Sparse Neural Trees for Supervised Learning
TLDR
This work aims to build a mathematical formulation of neural trees and gain the complementary benefits of both sparse optimal decision trees and neural trees, and proposes near-optimal sparse neural trees (NSNT) that is shown to be asymptotically consistent and robust in nature.
How to Explain Neural Networks: A perspective of data space division
TLDR
The principle of complete local interpretable model-agnostic explanations (CLIMEP) is proposed in this paper, and it is the first time that the complete decision boundary of FCNNs has been able to be obtained.
Near-optimal Sparse Neural Trees
TLDR
This work aims to build a mathematical formulation of neural trees and gain the complementary benefits of both sparse optimal decision trees and neural trees, and proposes near-optimal sparse neural trees (NSNT) that is shown to be asymptotically consistent and robust in nature.
...
...

References

SHOWING 1-10 OF 15 REFERENCES
Distilling the Knowledge in a Neural Network
TLDR
This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse.
Speech recognition with deep recurrent neural networks
TLDR
This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.
Going deeper with convolutions
We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition
Visualizing Higher-Layer Features of a Deep Network
TLDR
This paper contrast and compare several techniques applied on Stacked Denoising Autoencoders and Deep Belief Networks, trained on several vision datasets, and shows that good qualitative interpretations of high level features represented by such models are possible at the unit level.
Generative Adversarial Nets
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a
Efficient Non-greedy Optimization of Decision Trees
TLDR
It is shown that the problem of finding optimal linear-combination splits for decision trees is related to structured prediction with latent variables, and a convex-concave upper bound on the tree's empirical loss is formed, and the use of stochastic gradient descent for optimization enables effective training with large datasets.
Exploring the Limits of Language Modeling
TLDR
This work explores recent advances in Recurrent Neural Networks for large scale Language Modeling, and extends current models to deal with two key challenges present in this task: corpora and vocabulary sizes, and complex, long term structure of language.
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
TLDR
LIME is proposed, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning aninterpretable model locally varound the prediction.
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
TLDR
This paper addresses the visualisation of image classification models, learnt using deep Convolutional Networks (ConvNets), and establishes the connection between the gradient-based ConvNet visualisation methods and deconvolutional networks.
Auto-Encoding Variational Bayes
TLDR
A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.
...
...