Dynamic Deep Neural Networks: Optimizing Accuracy-Efficiency Trade-offs by Selective Execution

@inproceedings{Liu2017DynamicDN,
  title={Dynamic Deep Neural Networks: Optimizing Accuracy-Efficiency Trade-offs by Selective Execution},
  author={Lanlan Liu and Jia Deng},
  booktitle={AAAI Conference on Artificial Intelligence},
  year={2017}
}
  • Lanlan LiuJia Deng
  • Published in
    AAAI Conference on Artificial…
    2 January 2017
  • Computer Science
We introduce Dynamic Deep Neural Networks (D2NN), a new type of feed-forward deep neural network that allows selective execution. [] Key Method To achieve dynamic selective execution, a D2NN augments a feed-forward deep neural network (directed acyclic graph of differentiable modules) with controller modules. Each controller module is a sub-network whose output is a decision that controls whether other modules can execute. A D2NN is trained end to end.

Figures from this paper

BlockDrop: Dynamic Inference Paths in Residual Networks

BlockDrop, an approach that learns to dynamically choose which layers of a deep network to execute during inference so as to best reduce total computation without degrading prediction accuracy, is introduced.

Learning Time-Efficient Deep Architectures with Budgeted Super Networks

This work proposes a new family of models called Budgeted Super Networks that are learned using reinforcement-learning inspired techniques applied to a budgeted learning objective function which includes the computation cost during disk/memory operations at inference.

Routing Networks: Adaptive Selection of Non-linear Functions for Multi-Task Learning

A collaborative multi-agent reinforcement learning (MARL) approach is employed to jointly train the router and function blocks of a routing network, a kind of self-organizing neural network consisting of a router and a set of one or more function blocks.

Dynamically throttleable neural networks

This work presents a runtime dynamically throttleable neural network (DTNN) that can self-regulate its own performance target and computing resources by dynamically activating neurons in response to a single control signal, called utilization.

Dynamic Representations Toward Efficient Inference on Deep Neural Networks by Decision Gates

This study introduces the simple yet effective concept of decision gates (d-gate), modules trained to decide whether a sample needs to be projected into a deeper embedding or if an early prediction can be made at the d-gate, thus enabling the computation of dynamic representations at different depths.

Learning Time/Memory-Efficient Deep Architectures with Budgeted Super Networks

A novel family of models called Budgeted Super Networks (BSN) is proposed, learned using gradient descent techniques applied on a budgeted learning objective function which integrates a maximum authorized cost, while making no assumption on the nature of this cost.

Online Learning to Accelerate Neural Network Inference with Traveling Classifiers

This paper proposes traveling classifiers that continuously learn from the activations of two consecutive network layers to accelerate inference in real-time and demonstrates that this method significantly outperforms baseline approaches.

Fully Dynamic Inference With Deep Neural Networks

A fully dynamic paradigm that imparts deep convolutional neural networks with hierarchical inference dynamics at the level of layers and individual convolutionAL filters/channels is proposed, which consistently outperform state-of-the-art dynamic frameworks with respect to both efficiency and classification accuracy.

Efficient Inference on Deep Neural Networks by Dynamic Representations and Decision Gates

This study introduces the concept of decision gates (d-gate), modules trained to decide whether a sample needs to be projected into a deeper embedding or if an early prediction can be made at the d-gate, thus enabling the computation of dynamic representations at different depths.

Conditional Information Gain Networks

Conditional Information Gain Networks are proposed, which allow the feed forward deep neural networks to execute conditionally, skipping parts of the model based on the sample and the decision mechanisms inserted in the architecture, and show that the information gain based conditional execution approach can achieve better or comparable classification results using significantly fewer parameters, compared to standard convolutional neural network baselines.
...

References

SHOWING 1-10 OF 45 REFERENCES

Learning both Weights and Connections for Efficient Neural Network

A method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections, and prunes redundant connections using a three-step method.

Deep Sequential Neural Network

A new neural network model where each layer is associated with a set of candidate mappings that is able to process data with different characteristics through specific sequences of such local transformations, increasing the expression power of this model w.r.t a classical multilayered network.

Conditional Computation in Neural Networks for faster models

This paper applies a policy gradient algorithm for learning policies that optimize this loss function and proposes a regularization mechanism that encourages diversification of the dropout policy and presents encouraging empirical results showing that this approach improves the speed of computation without impacting the quality of the approximation.

Deep Networks with Internal Selective Attention through Feedback Connections

DasNet harnesses the power of sequential processing to improve classification performance, by allowing the network to iteratively focus its internal attention on some of its convolutional filters.

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

This work introduces a Sparsely-Gated Mixture-of-Experts layer (MoE), consisting of up to thousands of feed-forward sub-networks, and applies the MoE to the tasks of language modeling and machine translation, where model capacity is critical for absorbing the vast quantities of knowledge available in the training corpora.

Learning Structured Sparsity in Deep Neural Networks

The results show that for CIFAR-10, regularization on layer depth can reduce 20 layers of a Deep Residual Network to 18 layers while improve the accuracy from 91.25% to 92.60%, which is still slightly higher than that of original ResNet with 32 layers.

Recurrent Models of Visual Attention

A novel recurrent neural network model that is capable of extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution is presented.

Dynamic Capacity Networks

The Dynamic Capacity Network is introduced, a neural network that can adaptively assign its capacity across different portions of the input data by combining modules of two types: low-capacity sub-networks and high- capacity sub-nets, which indicate that DCNs are able to drastically reduce the number of computations.

Learning the Number of Neurons in Deep Networks

This paper proposes to make use of a group sparsity regularizer on the parameters of the network, where each group is defined to act on a single neuron, and shows that this approach can reduce the number of parameters by up to 80\% while retaining or even improving the network accuracy.

Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition

A simple two-step approach for speeding up convolution layers within large convolutional neural networks based on tensor decomposition and discriminative fine-tuning is proposed, leading to higher obtained CPU speedups at the cost of lower accuracy drops for the smaller of the two networks.