• Corpus ID: 213729382

Dynamic Model Pruning with Feedback

@article{Lin2020DynamicMP,
  title={Dynamic Model Pruning with Feedback},
  author={Tao Lin and Sebastian U. Stich and Luis Barba and Daniil Dmitriev and Martin Jaggi},
  journal={ArXiv},
  year={2020},
  volume={abs/2006.07253}
}
Deep neural networks often have millions of parameters. This can hinder their deployment to low-end devices, not only due to high memory requirements but also because of increased latency at inference. We propose a novel model compression method that generates a sparse trained model without additional overhead: by allowing (i) dynamic allocation of the sparsity pattern and (ii) incorporating feedback signal to reactivate prematurely pruned weights we obtain a performant sparse model in one… 
Dynamic Pruning of a Neural Network via Gradient Signal-to-Noise Ratio
TLDR
This work proposes to use the gradient noise to make pruning decisions and demonstrates that the procedure enables us to automatically adjust the sparsity during training without imposing a hand-designed sparsity schedule, while at the same time being able to recover from previous pruned decisions by unpruning connections as necessary.
Cyclical Pruning for Sparse Neural Networks
TLDR
Experimental results on both linear models and large-scale deep neural networks show that cyclical pruning outperforms existing pruning al-gorithms, especially at high sparsity ratios.
Effective Model Sparsification by Scheduled Grow-and-Prune Methods
TLDR
A novel scheduled grow-and-prune (GaP) methodology without having to pre-train a dense model is proposed, which addresses the shortcomings of the previous work by repeatedly growing a subset of layers to dense and then pruning them back to sparse after some training.
Back to Basics: Efficient Network Compression via IMP
TLDR
It is found that basic IMP with SLR for retraining can outperform state-of-the-art pruning-duringtraining approaches without or with only little computational overhead, that the global magnitude selection criterion is largely competitive with more complex approaches and that only few retraining epochs are needed in practice to achieve most of the sparsity-vs-performance tradeoff of IMP.
Simultaneous Training of Partially Masked Neural Networks
TLDR
It is shown that training a Transformer with a low-rank core gives a low -rank model with superior performance than when training the low- rank model alone.
Compression-aware Training of Neural Networks using Frank-Wolfe
TLDR
This work proposes leveraging k -support norm ball constraints and demonstrates improvements over the results of Miao et al.
Neural Pruning via Growing Regularization
TLDR
This work proposes an L2 regularization variant with rising penalty factors and shows it can bring significant accuracy gains compared with its one-shot counterpart, even when the same weights are removed.
Dynamic Collective Intelligence Learning: Finding Efficient Sparse Model via Refined Gradients for Pruned Weights
With the growth of deep neural networks (DNN), the number of DNN parameters has drastically increased. This makes DNN models hard to be deployed on resource-limited embedded systems. To alleviate
RED : Looking for Redundancies for Data-Free Structured Compression of Deep Neural Networks
TLDR
This paper presents RED, a data-free structured, unified approach to tackle structured pruning of deep Neural Networks, and proposes a novel adaptive hashing of the scalar DNN weight distribution densities to increase the number of identical neurons represented by their weight vectors.
Masked Training of Neural Networks with Partial Gradients
TLDR
This work proposes a theoretical framework to study stochastic gradient descent variants—encompassing the aforementioned algorithms and additionally a broad variety of methods used for communication efficient training or model compression and can be used as a guide to improve the e-ciency of such methods and facilitate generalization to new applications.
...
...

References

SHOWING 1-10 OF 50 REFERENCES
Rethinking the Value of Network Pruning
TLDR
It is found that with optimal learning rate, the "winning ticket" initialization as used in Frankle & Carbin (2019) does not bring improvement over random initialization, and the need for more careful baseline evaluations in future research on structured pruning methods is suggested.
Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization
TLDR
This work suggests that exploring structural degrees of freedom during training is more effective than adding extra parameters to the network, and outperforms previous static and dynamic reparameterization methods, yielding the best accuracy for a fixed parameter budget.
Compression-aware Training of Deep Networks
TLDR
It is shown that accounting for compression during training allows us to learn much more compact, yet at least as effective, models than state-of-the-art compression techniques.
To prune, or not to prune: exploring the efficacy of pruning for model compression
TLDR
Across a broad range of neural network architectures, large-sparse models are found to consistently outperform small-dense models and achieve up to 10x reduction in number of non-zero parameters with minimal loss in accuracy.
Dynamic Network Surgery for Efficient DNNs
TLDR
A novel network compression method called dynamic network surgery, which can remarkably reduce the network complexity by making on-the-fly connection pruning by proving that it outperforms the recent pruning method by considerable margins.
The State of Sparsity in Deep Neural Networks
TLDR
It is shown that unstructured sparse architectures learned through pruning cannot be trained from scratch to the same test set performance as a model trained with joint sparsification and optimization, and the need for large-scale benchmarks in the field of model compression is highlighted.
Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers
TLDR
This paper proposes a channel pruning technique for accelerating the computations of deep convolutional neural networks (CNNs) that focuses on direct simplification of the channel-to-channel computation graph of a CNN without the need of performing a computationally difficult and not-always-useful task.
SNIP: Single-shot Network Pruning based on Connection Sensitivity
TLDR
This work presents a new approach that prunes a given network once at initialization prior to training, and introduces a saliency criterion based on connection sensitivity that identifies structurally important connections in the network for the given task.
Exploring Sparsity in Recurrent Neural Networks
TLDR
This work proposes a technique to reduce the parameters of a network by pruning weights during the initial training of the network, which reduces the size of the model and can also help achieve significant inference time speed-up using sparse matrix multiply.
Deep Networks with Stochastic Depth
TLDR
Stochastic depth is proposed, a training procedure that enables the seemingly contradictory setup to train short networks and use deep networks at test time and reduces training time substantially and improves the test error significantly on almost all data sets that were used for evaluation.
...
...