Corpus ID: 170079280

Toward Runtime-Throttleable Neural Networks

  title={Toward Runtime-Throttleable Neural Networks},
  author={Jesse Hostetler},
As deep neural network (NN) methods have matured, there has been increasing interest in deploying NN solutions to "edge computing" platforms such as mobile phones or embedded controllers. These platforms are often resource-constrained, especially in energy storage and power, but state-of-the-art NN architectures are designed with little regard for resource use. Existing techniques for reducing the resource footprint of NN models produce static models that occupy a single point in the trade… Expand
Edge Intelligence: The Confluence of Edge Computing and Artificial Intelligence
The former focuses on providing more optimal solutions to key problems in edge computing with the help of popular and effective AI technologies while the latter studies how to carry out the entire process of building AI models, i.e., model training and inference, on the edge. Expand


BinaryConnect: Training Deep Neural Networks with binary weights during propagations
BinaryConnect is introduced, a method which consists in training a DNN with binary weights during the forward and backward propagations, while retaining precision of the stored weights in which gradients are accumulated, and near state-of-the-art results with BinaryConnect are obtained on the permutation-invariant MNIST, CIFAR-10 and SVHN. Expand
BlockDrop: Dynamic Inference Paths in Residual Networks
BlockDrop, an approach that learns to dynamically choose which layers of a deep network to execute during inference so as to best reduce total computation without degrading prediction accuracy, is introduced. Expand
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
This work introduces a Sparsely-Gated Mixture-of-Experts layer (MoE), consisting of up to thousands of feed-forward sub-networks, and applies the MoE to the tasks of language modeling and machine translation, where model capacity is critical for absorbing the vast quantities of knowledge available in the training corpora. Expand
DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices
  • N. Lane, S. Bhattacharya, +4 authors F. Kawsar
  • Computer Science
  • 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN)
  • 2016
Experiments show, DeepX can allow even large-scale deep learning models to execute efficently on modern mobile processors and significantly outperform existing solutions, such as cloud-based offloading. Expand
BranchyNet: Fast inference via early exiting from deep neural networks
The BranchyNet architecture is presented, a novel deep network architecture that is augmented with additional side branch classifiers that can both improve accuracy and significantly reduce the inference time of the network. Expand
Learning both Weights and Connections for Efficient Neural Network
A method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections, and prunes redundant connections using a three-step method. Expand
Binarized Neural Networks
A binary matrix multiplication GPU kernel is written with which it is possible to run the MNIST BNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy. Expand
Efficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search is a fast and inexpensive approach for automatic model design that establishes a new state-of-the-art among all methods without post-training processing and delivers strong empirical performances using much fewer GPU-hours. Expand
Conditional Computation in Neural Networks for faster models
This paper applies a policy gradient algorithm for learning policies that optimize this loss function and proposes a regularization mechanism that encourages diversification of the dropout policy and presents encouraging empirical results showing that this approach improves the speed of computation without impacting the quality of the approximation. Expand
Deep Networks with Stochastic Depth
Stochastic depth is proposed, a training procedure that enables the seemingly contradictory setup to train short networks and use deep networks at test time and reduces training time substantially and improves the test error significantly on almost all data sets that were used for evaluation. Expand