BranchyNet: Fast inference via early exiting from deep neural networks

  title={BranchyNet: Fast inference via early exiting from deep neural networks},
  author={Surat Teerapittayanon and Bradley McDanel and H. T. Kung},
  journal={2016 23rd International Conference on Pattern Recognition (ICPR)},
Deep neural networks are state of the art methods for many learning tasks due to their ability to extract increasingly better features at each network layer. However, the improved performance of additional layers in a deep network comes at the cost of added latency and energy usage in feedforward inference. As networks continue to get deeper and larger, these costs become more prohibitive for real-time and energy-sensitive applications. To address this issue, we present BranchyNet, a novel deep… 

Figures and Tables from this paper

PTEENet: Post-Trained Early-Exit Neural Networks Augmentation for Inference Cost Optimization

This work describes a method for introducing “shortcuts” into the DNN feedforward inference process by skipping costly feedforward computations whenever possible and suggests a new branch architecture based on convolutional building blocks to allow enough training capacity when applied on large DNNs.

Dynamic Representations Toward Efficient Inference on Deep Neural Networks by Decision Gates

This study introduces the simple yet effective concept of decision gates (d-gate), modules trained to decide whether a sample needs to be projected into a deeper embedding or if an early prediction can be made at the d-gate, thus enabling the computation of dynamic representations at different depths.

Learning to Weight Samples for Dynamic Early-exiting Networks

This work proposes to adopt a weight prediction network to weight the loss of different training samples at each exit of multi-exit networks, jointly optimized under a meta-learning framework with a novel optimization objective.

Differentiable Branching In Deep Networks for Fast Inference

This paper proposes a way to jointly optimize this strategy together with the branches, providing an end-to-end trainable algorithm for this emerging class of neural networks, by replacing the original output of the branches with a ‘soft’, differentiable approximation.

Efficient Inference on Deep Neural Networks by Dynamic Representations and Decision Gates

This study introduces the concept of decision gates (d-gate), modules trained to decide whether a sample needs to be projected into a deeper embedding or if an early prediction can be made at the d-gate, thus enabling the computation of dynamic representations at different depths.

BlockDrop: Dynamic Inference Paths in Residual Networks

BlockDrop, an approach that learns to dynamically choose which layers of a deep network to execute during inference so as to best reduce total computation without degrading prediction accuracy, is introduced.

SRCNN-PIL: Side Road Convolution Neural Network Based on Pseudoinverse Learning Algorithm

The Side Road Network (SRN) is proposed, an innovative deep network structure that is enhanced with further side road (SR) classifiers that allows the prediction of results for a major portion of test samples to exit the network earlier via these SR classifiers since samples can be inferred with certainty.

Why Should We Add Early Exits to Neural Networks?

This paper provides a comprehensive introduction to this family of neural networks, by describing in a unified fashion the way these architectures can be designed, trained, and actually deployed in time-constrained scenarios.

Dynamic Early Exit Scheduling for Deep Neural Network Inference through Contextual Bandits

This paper proposes Dynamic Early Exit (DEE), a real-time online learning algorithm based on contextual bandit analysis that can improve the overall performance by up to 98.1% compared to the best benchmark scheme.



Deep Networks with Stochastic Depth

Stochastic depth is proposed, a training procedure that enables the seemingly contradictory setup to train short networks and use deep networks at test time and reduces training time substantially and improves the test error significantly on almost all data sets that were used for evaluation.

Conditional Deep Learning for energy-efficient and enhanced pattern recognition

Conditional Deep Learning (CDL) where the convolutional layer features are used to identify the variability in the difficulty of input instances and conditionally activate the deeper layers of the network, which enables the network to dynamically adjust the computational effort depending upon the complexity of the input data.

Learning both Weights and Connections for Efficient Neural Network

A method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections, and prunes redundant connections using a three-step method.

Dropout: a simple way to prevent neural networks from overfitting

It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

Deep Residual Learning for Image Recognition

This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

Understanding the difficulty of training deep feedforward neural networks

The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future.

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.

Improving the speed of neural networks on CPUs

This paper uses speech recognition as an example task, and shows that a real-time hybrid hidden Markov model / neural network (HMM/NN) large vocabulary system can be built with a 10× speedup over an unoptimized baseline and a 4× speed up over an aggressively optimized floating-point baseline at no cost in accuracy.

Towards Open Set Deep Networks

  • Abhijit BendaleT. Boult
  • Computer Science
    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
The proposed OpenMax model significantly outperforms open set recognition accuracy of basic deep networks as well as deep networks with thresholding of SoftMax probabilities, and it is proved that the OpenMax concept provides bounded open space risk, thereby formally providing anopen set recognition solution.

Going deeper with convolutions

We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition