• Corpus ID: 52978527

Rethinking the Value of Network Pruning

  title={Rethinking the Value of Network Pruning},
  author={Zhuang Liu and Mingjie Sun and Tinghui Zhou and Gao Huang and Trevor Darrell},
Network pruning is widely used for reducing the heavy inference cost of deep models in low-resource settings. [] Key Result Our results suggest the need for more careful baseline evaluations in future research on structured pruning methods. We also compare with the "Lottery Ticket Hypothesis" (Frankle & Carbin 2019), and find that with optimal learning rate, the "winning ticket" initialization as used in Frankle & Carbin (2019) does not bring improvement over random initialization.
Pruning from Scratch
This work finds that pre-training an over-parameterized model is not necessary for obtaining the target pruned structure, and empirically shows that more diverse pruned structures can be directly pruned from randomly initialized weights, including potential models with better performance.
Paying more Attention to Snapshots of Iterative Pruning: Improving Model Compression via Ensemble Distillation
It is shown that strong ensembles can be constructed from snapshots of iterative pruning, which achieve competitive performance and vary in network structure and this work presents simple, general and effective pipeline that generates strongEnsembles of networks during pruning with large learning rate restarting.
Beyond Network Pruning: a Joint Search-and-Training Approach
It is possible to expand the search space of networking pruning by associating each filter with a learnable weight and joint search-and-training can be conducted iteratively to maximize the learning efficiency.
Activation Density Driven Efficient Pruning in Training
A novel pruning method that prunes a network real-time during training, reducing the overall training time to achieve an efficient compressed network and introducing an activation density based analysis to identify the optimal relative sizing or compression for each layer of the network.
Learning Pruned Structure and Weights Simultaneously from Scratch: an Attention based Approach
A novel unstructured pruning pipeline, Attention-based Simultaneous sparse structure and Weight Learning (ASWL), which achieves superior pruning results in terms of accuracy, pruning ratio and operating efficiency when compared with state-of-the-art network pruning methods.
Lost in Pruning: The Effects of Pruning Neural Networks beyond Test Accuracy
This work reassess and evaluate whether the use of test accuracy alone in the terminating condition is sufficient to ensure that the resulting model performs well across a wide spectrum of "harder" metrics such as generalization to out-of-distribution data and resilience to noise.
Evaluating the Merits of Ranking in Structured Network Pruning
This work further explores the value of the ranking criteria in pruning to show that if channels are removed gradually and iteratively, alternating with fine-tuning on the target dataset, ranking criteria are indeed not necessary to select redundant channels, and proposes a GFLOPs-aware iterative pruning strategy that can further lead to lower inference time by 15% without sacrificing accuracy.
LSOP: Layer-Scaled One-shot Pruning A Simple and Effective Deep Pruning Framework for Neural Networks
A framework that can generalize the mechanism among various pruning techniques, which can be used to guide users to design better deep pruning methods in the future, and implies that polynomial decay and Low-Rank Matrix Approximation techniques from the field of data science can provide support for neural network pruning.
When to Prune? A Policy towards Early Structural Pruning
This work introduces an Early Pruning Indicator (EPI) that relies on sub-network architectural similarity and quickly triggers pruning when the sub- network’s architecture stabilizes, and offers a new efficiency-accuracy boundary for network pruning during training.
AutoPrune: Automatic Network Pruning by Regularizing Auxiliary Parameters
This work proposes AutoPrune, which prunes the network through optimizing a set of trainable auxiliary parameters instead of original weights, and can automatically eliminate network redundancy with recoverability, relieving the complicated prior knowledge required to design thresholding functions, and reducing the time for trial and error.


Pruning Convolutional Neural Networks for Resource Efficient Inference
It is shown that pruning can lead to more than 10x theoretical (5x practical) reduction in adapted 3D-convolutional filters with a small drop in accuracy in a recurrent gesture classifier.
Layer-compensated Pruning for Resource-constrained Convolutional Neural Networks
This work aims to improve the performance of resource-constrained filter pruning by merging two sub-problems commonly considered, i.e., how many filters to prune for each layer and which filters toPrune given a per-layer pruning budget, into a global filter ranking problem.
Soft Taylor Pruning for Accelerating Deep Convolutional Neural Networks
A novel Gradient-based method, Soft Taylor Pruning (STP), is proposed to reduce the network complexity in dynamic way by allowing simultaneous pruning on multiple layers by controlling the opening and closing of multiple mask layers.
Efficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search is a fast and inexpensive approach for automatic model design that establishes a new state-of-the-art among all methods without post-training processing and delivers strong empirical performances using much fewer GPU-hours.
Recovering from Random Pruning: On the Plasticity of Deep Convolutional Neural Networks
Counter-intuitive results wherein by randomly pruning 25-50% filters from deep CNNs the authors are able to obtain the same performance as obtained by using state of the art pruning methods are shown.
Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers
This paper proposes a channel pruning technique for accelerating the computations of deep convolutional neural networks (CNNs) that focuses on direct simplification of the channel-to-channel computation graph of a CNN without the need of performing a computationally difficult and not-always-useful task.
Data-free Parameter Pruning for Deep Neural Networks
It is shown how similar neurons are redundant, and a systematic way to remove them is proposed, which can be applied on top of most networks with a fully connected layer to give a smaller network.
NISP: Pruning Networks Using Neuron Importance Score Propagation
  • Ruichi YuAng Li L. Davis
  • Computer Science
    2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2018
The Neuron Importance Score Propagation (NISP) algorithm is proposed to propagate the importance scores of final responses to every neuron in the network and is evaluated on several datasets with multiple CNN models and demonstrated to achieve significant acceleration and compression with negligible accuracy loss.
"Learning-Compression" Algorithms for Neural Net Pruning
This work forms pruning as an optimization problem of finding the weights that minimize the loss while satisfying a pruning cost condition, and gives a generic algorithm to solve this which alternates "learning" steps that optimize a regularized, data-dependent loss and "compression" Steps that mark weights for pruning in a data-independent way.
Neural Architecture Search with Reinforcement Learning
This paper uses a recurrent network to generate the model descriptions of neural networks and trains this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set.