• Corpus ID: 211678101

Learned Threshold Pruning

  title={Learned Threshold Pruning},
  author={Kambiz Azarian and Yash Bhalgat and Jinwon Lee and Tijmen Blankevoort},
This paper presents a novel differentiable method for unstructured weight pruning of deep neural networks. Our learned-threshold pruning (LTP) method enjoys a number of important advantages. First, it learns per-layer thresholds via gradient descent, unlike conventional methods where they are set as input. Making thresholds trainable also makes LTP computationally efficient, hence scalable to deeper networks. For example, it takes less than $30$ epochs for LTP to prune most networks on ImageNet… 

Figures and Tables from this paper

Hierarchical Adaptive Lasso: Learning Sparse Neural Networks with Shrinkage via Single Stage Training
A novel penalty called Hierarchical Adaptive Lasso (HALO) which learns to adaptively sparsify weights of a given network via trainable parameters without learning a mask is presented.
Unsupervised Adaptive Weight Pruning for Energy-Efficient Neuromorphic Systems
An unsupervised online adaptive weight pruning method that dynamically removes non-critical weights from a spiking neural network (SNN) to reduce network complexity and improve energy efficiency and offers a promising solution for effective network compression and building highly energy-efficient neuromorphic systems in real-time applications.
Cyclical Pruning for Sparse Neural Networks
Experimental results on both linear models and large-scale deep neural networks show that cyclical pruning outperforms existing pruning algorithms, especially at high sparsity ratios.
When Sparsity Meets Dynamic Convolution
A binary mask derived from a learnable threshold to prune static kernels is designed, reducing the parameters and computational cost but achieving higher performance in Imagenet-1K, and a novel dynamic sparse network incorporating the dynamic routine mechanism is proposed.
This work formalizes the server-orchestrated federated learning process as a hierarchical latent variable model where the server provides the parameters of a prior distribution over the client-specific model parameters, and proposes a variant of the hierarchical model that employs prior distributions to promote sparsity.
An Expectation-Maximization Perspective on Federated Learning
This work views the serverorchestrated federated learning process as a hierarchical latent variable model where the server provides the parameters of a prior distribution over the clientspecific model parameters, and proposes a variant of the hierarchical model that employs prior distributions to promote sparsity.
Explainable Natural Language Processing
  • Anders Søgaard
  • Computer Science
    Synthesis Lectures on Human Language Technologies
  • 2021
Soft Threshold Weight Reparameterization for Learnable Sparsity
STR is a simple mechanism which learns effective sparsity budgets that contrast with popular heuristics that boosts the accuracy over existing results by up to 10% in the ultra sparse (99%) regime and can also be used to induce low-rank (structured sparsity) in RNNs.
Structured Convolutions for Efficient Neural Network Design
This work tackles model efficiency by exploiting redundancy in the building blocks of convolutional neural networks by presenting a Structural Regularization loss that promotes neural network layers to leverage on this desired structure in a way that, after training, they can be decomposed with negligible performance loss.


Learning Sparse Neural Networks through L0 Regularization
A practical method for L_0 norm regularization for neural networks: pruning the network during training by encouraging weights to become exactly zero, which allows for straightforward and efficient learning of model structures with stochastic gradient descent and allows for conditional computation in a principled way.
Structured Pruning of Deep Convolutional Neural Networks
The proposed work shows that when pruning granularities are applied in combination, the CIFAR-10 network can be pruned by more than 70% with less than a 1% loss in accuracy.
Automated Pruning for Deep Neural Network Compression
This is the first study where the generalization capabilities in transfer learning tasks of the features extracted by a pruned network are analyzed, and it is shown that the representations learned using the proposed pruning methodology maintain the same effectiveness and generality of those learned by the corresponding non-compressed network on a set of different recognition tasks.
Pruning Filters for Efficient ConvNets
This work presents an acceleration method for CNNs, where it is shown that even simple filter pruning techniques can reduce inference costs for VGG-16 and ResNet-110 by up to 38% on CIFAR10 while regaining close to the original accuracy by retraining the networks.
Learning both Weights and Connections for Efficient Neural Network
A method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections, and prunes redundant connections using a three-step method.
A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers
A systematic weight pruning framework of DNNs using the alternating direction method of multipliers (ADMM) is presented, which can reduce the total computation by five times compared with the prior work and achieves a fast convergence rate.
Norm matters: efficient and accurate normalization schemes in deep networks
A novel view is presented on the purpose and function of normalization methods and weight-decay, as tools to decouple weights' norm from the underlying optimized objective, and a modification to weight-normalization, which improves its performance on large-scale tasks.
SYQ: Learning Symmetric Quantization for Efficient Deep Neural Networks
It is shown that symmetric quantization can substantially improve accuracy for networks with extremely low-precision weights and activations, and it is demonstrated that this representation imposes minimal or no hardware implications to more coarse-grained approaches.
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Channel Pruning for Accelerating Very Deep Neural Networks
  • Yihui He, X. Zhang, Jian Sun
  • Computer Science
    2017 IEEE International Conference on Computer Vision (ICCV)
  • 2017
This paper proposes an iterative two-step algorithm to effectively prune each layer, by a LASSO regression based channel selection and least square reconstruction, and generalizes this algorithm to multi-layer and multi-branch cases.