• Corpus ID: 47020554

Smallify: Learning Network Size while Training

  title={Smallify: Learning Network Size while Training},
  author={Guillaume Leclerc and Manasi Vartak and Raul Castro Fernandez and Tim Kraska and Samuel Madden},
As neural networks become widely deployed in different applications and on different hardware, it has become increasingly important to optimize inference time and model size along with model accuracy. Most current techniques optimize model size, model accuracy and inference time in different stages, resulting in suboptimal results and computational inefficiency. In this work, we propose a new technique called Smallify that optimizes all three of these metrics at the same time. Specifically we… 

Figures from this paper

A "Network Pruning Network" Approach to Deep Model Compression

This work presents a filter pruning approach for deep model compression, using a multitask network that can prune the network in one go and does not require specifying the degree of pruning for each layer (and can learn it instead).

Filter Distribution Templates in Convolutional Networks for Image Classification Tasks

This work presents a series of modifications in the distribution of filters in three popular neural network models and their effects in accuracy and resource consumption and shows that some models improve up to 8.9% in accuracy showing reductions in parameters up to 54%.

Learning Sparse Networks Using Targeted Dropout

Target dropout is introduced, a method for training a neural network so that it is robust to subsequent pruning, and improves upon more complicated sparsifying regularisers while being simple to implement and easy to tune.

Filter redistribution templates for iteration-lessconvolutional model reduction

A small set of templates are applied to make a one-shot redistribution of the number of filters in an already existing neural network to find models with improved characteristics, and it is found that the resulting architectures surpass the original accuracy even after been reduced to fit to the original amount of resources.

Towards Efficient Convolutional Network Models with Filter Distribution Templates

A small set of templates consisting of easy to implement, intuitive and aggressive variations of the original pyramidal distribution of filters in VGG and ResNet architectures are introduced, showing that models produced by these templates, are more efficient in terms of fewer parameters and memory needs.

Compressibility Loss for Neural Network Weights

It is shown that minimizing the compressibility loss also enforces the non-zero parts of the signal to have very low entropy, thus making the entire signal more compressible.

The Generalization-Stability Tradeoff in Neural Network Pruning

This work analyzes the behavior of pruning over the course of training, finding that pruning's benefit to generalization increases with prune's instability, and proposes a mechanism for its cause: pruning regularizes similarly to noise injection.

Applications of deep learning to speech enhancement.

This work proposes a model to perform speech dereverberation by estimating its spectral magnitude from the reverberant counterpart and proposes a method to prune those neurons away from the model without having an impact in performance, and compares this method to other methods in the literature.

Neural Network Distiller: A Python Package For DNN Compression Research

Distiller is a library of DNN compression algorithms implementations, with tools, tutorials and sample applications for various learning tasks, and the rich content is complemented by a design-for-extensibility to facilitate new research.

Flexible, non-parametric modeling using regularized neural networks

This work proposes PrAda-net, a one-hidden-layer neural network, trained with proximal gradient descent and adaptive lasso, which requires no preliminary modeling to select the functional forms of the additive components, yet still results in an interpretable model representation.



Learning Efficient Convolutional Networks through Network Slimming

The approach is called network slimming, which takes wide and large networks as input models, but during training insignificant channels are automatically identified and pruned afterwards, yielding thin and compact models with comparable accuracy.

Compression-aware Training of Deep Networks

It is shown that accounting for compression during training allows us to learn much more compact, yet at least as effective, models than state-of-the-art compression techniques.

Less Is More: Towards Compact CNNs

This work shows that, by incorporating sparse constraints into the objective function, it is possible to decimate the number of neurons during the training stage, thus theNumber of parameters and the memory footprint of the neural network are reduced, which is desirable at the test time.

Learning the Number of Neurons in Deep Networks

This paper proposes to make use of a group sparsity regularizer on the parameters of the network, where each group is defined to act on a single neuron, and shows that this approach can reduce the number of parameters by up to 80\% while retaining or even improving the network accuracy.

FitNets: Hints for Thin Deep Nets

This paper extends the idea of a student network that could imitate the soft output of a larger teacher network or ensemble of networks, using not only the outputs but also the intermediate representations learned by the teacher as hints to improve the training process and final performance of the student.

Learning Structured Sparsity in Deep Neural Networks

The results show that for CIFAR-10, regularization on layer depth can reduce 20 layers of a Deep Residual Network to 18 layers while improve the accuracy from 91.25% to 92.60%, which is still slightly higher than that of original ResNet with 32 layers.

Combined Group and Exclusive Sparsity for Deep Neural Networks

This work proposes an exclusive sparsity regularization based on (1, 2)-norm, which promotes competition for features between different weights, thus enforcing them to fit to disjoint sets of features, and combines theexclusive sparsity with the group sparsity, to promote both sharing and competition for Features in training of a deep neural network.

Neural Architecture Search with Reinforcement Learning

This paper uses a recurrent network to generate the model descriptions of neural networks and trains this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set.

Coordinating Filters for Faster Deep Neural Networks

Force Regularization, which uses attractive forces to enforce filters so as to coordinate more weight information into lower-rank space, is proposed and mathematically and empirically verified that after applying this technique, standard LRA methods can reconstruct filters using much lower basis and thus result in faster DNNs.

Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding

This work introduces "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy.