# Distilling a Neural Network Into a Soft Decision Tree

@article{Frosst2017DistillingAN, title={Distilling a Neural Network Into a Soft Decision Tree}, author={Nicholas Frosst and Geoffrey E. Hinton}, journal={ArXiv}, year={2017}, volume={abs/1711.09784} }

Deep neural networks have proved to be a very effective way to perform classification tasks. They excel when the input data is high dimensional, the relationship between the input and the output is complicated, and the number of labeled training examples is large. But it is hard to explain why a learned network makes a particular classification decision on a particular test case. This is due to their reliance on distributed hierarchical representations. If we could take the knowledge acquired…

## 391 Citations

DecisioNet: A Binary-Tree Structured Neural Network

- Computer ScienceArXiv
- 2022

This paper proposes a systematic way to convert an existing DNN into a DN to create a lightweight version of the original model, and presents DecisioNet, a binary-tree structured neural network that takes the best of both worlds.

A Set Membership Approach to Discovering Feature Relevance and Explaining Neural Classifier Decisions

- Computer ScienceArXiv
- 2022

The proposed methodology builds on sound mathematical approaches and the results obtained constitute a reliable estimation of the classiﬁer’s decision premisses, thus obtaining an explanation on its decision.

Learning Decision Trees Recurrently Through Communication

- Computer Science2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2021

This model generates human interpretable binary decision sequences explaining the predictions of the network while maintaining state-of-the-art accuracy on three benchmark image classification datasets, including the large-scale ImageNet.

Adaptive Neural Trees

- Computer ScienceICML
- 2019

Adapt neural trees via adaptive neural trees (ANTs) that incorporates representation learning into edges, routing functions and leaf nodes of a decision tree, along with a backpropagation-based training algorithm that adaptively grows the architecture from primitive modules (e.g., convolutional layers).

NN2Rules: Extracting Rule List from Neural Networks

- Computer ScienceArXiv
- 2022

A key contribution of NN2Rules is that it allows hidden neuron behavior to be either soft-binary (eg. sigmoid activation) or rectiﬁed linear (ReLU) as opposed to existing decompositional approaches that were developed with the assumption ofsoft-binary activation.

Layerwise Knowledge Extraction from Deep Convolutional Networks

- Computer ScienceArXiv
- 2020

A novel layerwise knowledge extraction method using M-of-N rules which seeks to obtain the best trade-off between the complexity and accuracy of rules describing the hidden features of a deep network and it is shown empirically that this approach produces rules close to an optimal complexity-error tradeoff.

Distilling a Deep Neural Network into a Takagi-Sugeno-Kang Fuzzy Inference System

- Computer ScienceArXiv
- 2020

Knowledge distillation (KD) is applied to create a TSK-type FIS that generalizes better than one directly from the training data, which is guaranteed through experiments in this paper.

A Survey on the Explainability of Supervised Machine Learning

- Computer ScienceJ. Artif. Intell. Res.
- 2021

This survey paper provides essential definitions, an overview of the different principles and methodologies of explainable Supervised Machine Learning, and a state-of-the-art survey that reviews past and recent explainable SML approaches and classifies them according to the introduced definitions.

Interpreting Deep Neural Networks Through Backpropagation

- Computer Science
- 2019

This thesis explores a generic method for creating explanations for the decisions of any neural network using backpropagation, an internal algorithm common across all Neural Network architectures, to understand the correlation between the input to a network and the network’s output.

Near-Optimal Sparse Neural Trees for Supervised Learning

- Computer Science
- 2021

This work aims to build a mathematical formulation of neural trees and gain the complementary benefits of both sparse optimal decision trees and neural trees, and proposes near-optimal sparse neural trees (NSNT) that is shown to be asymptotically consistent and robust in nature.

## References

SHOWING 1-10 OF 15 REFERENCES

Distilling the Knowledge in a Neural Network

- Computer ScienceArXiv
- 2015

This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse.

Speech recognition with deep recurrent neural networks

- Computer Science2013 IEEE International Conference on Acoustics, Speech and Signal Processing
- 2013

This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.

Going deeper with convolutions

- Computer Science2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015

We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition…

Visualizing Higher-Layer Features of a Deep Network

- Computer Science
- 2009

This paper contrast and compare several techniques applied on Stacked Denoising Autoencoders and Deep Belief Networks, trained on several vision datasets, and shows that good qualitative interpretations of high level features represented by such models are possible at the unit level.

Generative Adversarial Nets

- Computer ScienceNIPS
- 2014

We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a…

Efficient Non-greedy Optimization of Decision Trees

- Computer ScienceNIPS
- 2015

It is shown that the problem of finding optimal linear-combination splits for decision trees is related to structured prediction with latent variables, and a convex-concave upper bound on the tree's empirical loss is formed, and the use of stochastic gradient descent for optimization enables effective training with large datasets.

Exploring the Limits of Language Modeling

- Computer ScienceArXiv
- 2016

This work explores recent advances in Recurrent Neural Networks for large scale Language Modeling, and extends current models to deal with two key challenges present in this task: corpora and vocabulary sizes, and complex, long term structure of language.

"Why Should I Trust You?": Explaining the Predictions of Any Classifier

- Computer ScienceHLT-NAACL Demos
- 2016

LIME is proposed, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning aninterpretable model locally varound the prediction.

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

- Computer ScienceICLR
- 2014

This paper addresses the visualisation of image classification models, learnt using deep Convolutional Networks (ConvNets), and establishes the connection between the gradient-based ConvNet visualisation methods and deconvolutional networks.

Auto-Encoding Variational Bayes

- Computer ScienceICLR
- 2014

A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.