Corpus ID: 201646137

# DeepHoyer: Learning Sparser Neural Network with Differentiable Scale-Invariant Sparsity Measures

@article{Yang2020DeepHoyerLS,
title={DeepHoyer: Learning Sparser Neural Network with Differentiable Scale-Invariant Sparsity Measures},
author={Huanrui Yang and W. Wen and H. Li},
journal={ArXiv},
year={2020},
volume={abs/1908.09979}
}
• Published 2020
• Computer Science, Mathematics
• ArXiv
In seeking for sparse and efficient neural network models, many previous works investigated on enforcing L1 or L0 regularizers to encourage weight sparsity during training. The L0 regularizer measures the parameter sparsity directly and is invariant to the scaling of parameter values, but it cannot provide useful gradients, and therefore requires complex optimization techniques. The L1 regularizer is almost everywhere differentiable and can be easily optimized with gradient descent. Yet it is… Expand

#### Figures, Tables, and Topics from this paper

Learning Deep Sparse Regularizers with Applications to Multi-View Clustering and Semi-Supervised Classification.
• Medicine
• IEEE transactions on pattern analysis and machine intelligence
• 2021
A deep sparse regularizer learning model that learns data-driven sparse regularizers adaptively and applies its framework to the multi-view clustering and semi-supervised classification tasks for learning a latent compact representation. Expand
Inducing and Exploiting Activation Sparsity for Fast Neural Network Inference
Optimizing deep neural networks for inference has recently become an extremely active area of research. One of the go-to solutions in this context is weight pruning, which aims to reduceExpand
Nonconvex penalization for sparse neural networks
• Computer Science, Mathematics
• ArXiv
• 2020
This work proposes a nonconvex penalization method for the outer weights that maintains the advantages of the convex approach and investigates the analytic aspects of the method in the context of neural network integral representations and proves attainability of minimizers. Expand
Grouped sparse projection
• Computer Science, Engineering
• ArXiv
• 2019
A new sparse projection method for a set of vectors in order to achieve a desired average level of sparsity which is measured using the ratio of the $\ell_1$ and $\ell-2$ norms which can be used to sparsify the columns of a matrix and to learn sparse deep networks. Expand
On obtaining sparse semantic solutions for inverse problems, control, and neural network training
• Computer Science
• J. Comput. Phys.
• 2021
A novel column space search approach that emphasizes the data over the model, as well as a novel iterative Levenberg-Marquardt algorithm that smoothly converges to a regularized SVD as opposed to the abrupt truncation inherent to PCA are proposed. Expand
FlipOut: Uncovering Redundant Weights via Sign Flipping
• Computer Science, Mathematics
• BNAIC/BENELEARN
• 2020
A novel pruning method which uses the oscillations around $0$ (i.e. sign flips) that a weight has undergone during training in order to determine its saliency and can directly target the level of sparsity desired by the user. Expand
Efficient and Sparse Neural Networks by Pruning Weights in a Multiobjective Learning Approach
• Computer Science, Mathematics
• ArXiv
• 2020
A multiobjective perspective on the training of neural networks is suggested by treating its prediction accuracy and the network complexity as two individual objective functions in a biobjective optimization problem. Expand
BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization
• Computer Science
• ICLR
• 2021
Mixed-precision quantization can potentially achieve the optimal tradeoff between performance and compression rate of deep neural networks, and thus, have been widely investigated. However, it lacksExpand
ChipNet: Budget-Aware Pruning with Heaviside Continuous Approximations
• Computer Science
• ICLR
• 2021
ChipNet is presented, a deterministic pruning strategy that employs continuous Heaviside function and a novel crispness loss to identify a highly sparse network out of an existing dense network that outperforms state-of-the-art structured pruning methods by remarkable margins. Expand
Neural Network Training Using 𝓁1-Regularization and Bi-fidelity Data
• Computer Science, Mathematics
• ArXiv
• 2021
Two variants of `1-regularization informed by the parameters of an identical network trained using data from lower-fidelity models of the problem at hand are considered, which are generalizations of transfer learning of neural networks that uses the parameters learned from a large low-f fidelity dataset to efficiently train networks for a small high- fidelity dataset. Expand

#### References

SHOWING 1-10 OF 59 REFERENCES
Transformed 𝓁1 Regularization for Learning Sparse Deep Neural Networks
• Medicine, Computer Science
• Neural Networks
• 2019
A new non-convex integrated transformed ℓ1 regularizer is introduced to promote sparsity for DNNs, which removes redundant connections and unnecessary neurons simultaneously, and an efficient stochastic proximal gradient algorithm is presented to solve the new model. Expand
Learning Structured Sparsity in Deep Neural Networks
• Computer Science, Mathematics
• NIPS
• 2016
The results show that for CIFAR-10, regularization on layer depth can reduce 20 layers of a Deep Residual Network to 18 layers while improve the accuracy from 91.25% to 92.60%, which is still slightly higher than that of original ResNet with 32 layers. Expand
Learning Sparse Neural Networks through L0 Regularization
• Computer Science, Mathematics
• ICLR
• 2018
A practical method for L_0 norm regularization for neural networks: pruning the network during training by encouraging weights to become exactly zero, which allows for straightforward and efficient learning of model structures with stochastic gradient descent and allows for conditional computation in a principled way. Expand
Data-Driven Sparse Structure Selection for Deep Neural Networks
• Computer Science
• ECCV
• 2018
A simple and effective framework to learn and prune deep models in an end-to-end manner by adding sparsity regularizations on factors, and solving the optimization problem by a modified stochastic Accelerated Proximal Gradient (APG) method. Expand
Structured Bayesian Pruning via Log-Normal Multiplicative Noise
• Computer Science, Mathematics
• NIPS
• 2017
A new Bayesian model is proposed that takes into account the computational structure of neural networks and provides structured sparsity, e.g. removes neurons and/or convolutional channels in CNNs and provides significant acceleration on a number of deep neural architectures. Expand
Toward Compact ConvNets via Structure-Sparsity Regularized Filter Pruning
• Computer Science, Medicine
• IEEE Transactions on Neural Networks and Learning Systems
• 2020
This paper proposes a novel filter pruning scheme, termed structured sparsity regularization (SSR), to simultaneously speed up the computation and reduce the memory overhead of CNNs, which can be well supported by various off-the-shelf deep learning libraries. Expand
Sparse Convolutional Neural Networks
• Computer Science
• 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
• 2015
This work shows how to reduce the redundancy in these parameters using a sparse decomposition, and proposes an efficient sparse matrix multiplication algorithm on CPU for Sparse Convolutional Neural Networks (SCNN) models. Expand
Faster CNNs with Direct Sparse Convolutions and Guided Pruning
An efficient general sparse-with-dense matrix multiplication implementation that is applicable to convolution of feature maps with kernels of arbitrary sparsity patterns and a performance model that predicts sweet spots of sparsity levels for different layers and on different computer architectures are developed. Expand
Discrimination-aware Channel Pruning for Deep Neural Networks
This work investigates a simple-yet-effective method, called discrimination-aware channel pruning, to choose those channels that really contribute to discriminative power and proposes a greedy algorithm to conduct channel selection and parameter optimization in an iterative way. Expand
Ratio and difference of $l_1$ and $l_2$ norms and sparse representation with coherent dictionaries
• Mathematics, Computer Science
• Commun. Inf. Syst.
• 2014
The mathematical theory of the sparsity promoting properties of the ratio metric in the context of basis pursuit via over-complete dictionaries is studied and sequentially convex algorithms are introduced to illustrate how the ratio and difference penalties are computed to produce both stable and sparse solutions. Expand