• Corpus ID: 201646137

DeepHoyer: Learning Sparser Neural Network with Differentiable Scale-Invariant Sparsity Measures

  title={DeepHoyer: Learning Sparser Neural Network with Differentiable Scale-Invariant Sparsity Measures},
  author={Huanrui Yang and Wei Wen and Hai Helen Li},
In seeking for sparse and efficient neural network models, many previous works investigated on enforcing L1 or L0 regularizers to encourage weight sparsity during training. The L0 regularizer measures the parameter sparsity directly and is invariant to the scaling of parameter values, but it cannot provide useful gradients, and therefore requires complex optimization techniques. The L1 regularizer is almost everywhere differentiable and can be easily optimized with gradient descent. Yet it is… 

Figures and Tables from this paper

Learning Deep Sparse Regularizers with Applications to Multi-View Clustering and Semi-Supervised Classification.

A deep sparse regularizer learning model that learns data-driven sparse regularizers adaptively and applies this model to multi-view clustering and semi-supervised classification tasks to learn a latent compact representation.

Inducing and Exploiting Activation Sparsity for Fast Neural Network Inference

A new algorithm to enable fast convolution operations over networks with sparse activations, and it can enable significant speedups for end-to-end inference on a range of popular models on the large-scale ImageNet image classification task on modern Intel CPUs, with relatively low retraining cost.

Explicit Group Sparse Projection with Applications to Deep Learning and NMF

A new sparse projection method for a set of vectors that guarantees a desired average sparsity level measured leveraging the popular Hoyer measure, and proposes a generalization of this projection by replacing the `1 norm by its weighted version.

Nonconvex penalization for sparse neural networks

This work proposes a nonconvex penalization method for the outer weights that maintains the advantages of the convex approach and investigates the analytic aspects of the method in the context of neural network integral representations and proves attainability of minimizers.

Quantized Sparse Training: A Unified Trainable Framework for Joint Pruning and Quantization in DNNs

A novel compression framework that prunes and quantizes networks jointly in a unified training process based on the straight-through estimator is proposed, which achieves a 135KB model size in case of VGG16, without any accuracy degradation.

Grouped sparse projection

A new sparse projection method for a set of vectors in order to achieve a desired average level of sparsity which is measured using the ratio of the $\ell_1$ and $\ell-2$ norms which can be used to sparsify the columns of a matrix and to learn sparse deep networks.

Algorithms for Efficiently Learning Low-Rank Neural Networks

This work presents a provably efficient algorithm which learns an optimal low-rank approximation to a single-hidden-layer ReLU network up to additive error with probability ≥ 1 − δ, given access to noiseless samples with Gaussian marginals in polynomial time and samples.

SASL: Saliency-Adaptive Sparsity Learning for Neural Network Acceleration

A Saliency-Adaptive Sparsity Learning (SASL) approach for further optimization of CNNs, where the regularization strength is adjusted according to the saliency, so the optimized format can better preserve the prediction performance while zeroing out more computation-heavy filters.

Nonlinear Initialization Methods for Low-Rank Neural Networks

A practical algorithm is provided to solve the ReLU low-rank approximation problem for parameter rank r which is no more expensive than the existing spectral initialization approach and provably demonstrates that there is a gap between these two approaches for ReLU networks.



Transformed 𝓁1 Regularization for Learning Sparse Deep Neural Networks

Learning Structured Sparsity in Deep Neural Networks

The results show that for CIFAR-10, regularization on layer depth can reduce 20 layers of a Deep Residual Network to 18 layers while improve the accuracy from 91.25% to 92.60%, which is still slightly higher than that of original ResNet with 32 layers.

Learning Sparse Neural Networks through L0 Regularization

A practical method for L_0 norm regularization for neural networks: pruning the network during training by encouraging weights to become exactly zero, which allows for straightforward and efficient learning of model structures with stochastic gradient descent and allows for conditional computation in a principled way.

Data-Driven Sparse Structure Selection for Deep Neural Networks

A simple and effective framework to learn and prune deep models in an end-to-end manner by adding sparsity regularizations on factors, and solving the optimization problem by a modified stochastic Accelerated Proximal Gradient (APG) method.

Structured Bayesian Pruning via Log-Normal Multiplicative Noise

A new Bayesian model is proposed that takes into account the computational structure of neural networks and provides structured sparsity, e.g. removes neurons and/or convolutional channels in CNNs and provides significant acceleration on a number of deep neural architectures.

Toward Compact ConvNets via Structure-Sparsity Regularized Filter Pruning

This paper proposes a novel filter pruning scheme, termed structured sparsity regularization (SSR), to simultaneously speed up the computation and reduce the memory overhead of CNNs, which can be well supported by various off-the-shelf deep learning libraries.

Sparse Convolutional Neural Networks

This work shows how to reduce the redundancy in these parameters using a sparse decomposition, and proposes an efficient sparse matrix multiplication algorithm on CPU for Sparse Convolutional Neural Networks (SCNN) models.

Faster CNNs with Direct Sparse Convolutions and Guided Pruning

An efficient general sparse-with-dense matrix multiplication implementation that is applicable to convolution of feature maps with kernels of arbitrary sparsity patterns and a performance model that predicts sweet spots of sparsity levels for different layers and on different computer architectures are developed.

Discrimination-aware Channel Pruning for Deep Neural Networks

This work investigates a simple-yet-effective method, called discrimination-aware channel pruning, to choose those channels that really contribute to discriminative power and proposes a greedy algorithm to conduct channel selection and parameter optimization in an iterative way.

A Survey on Nonconvex Regularization-Based Sparse and Low-Rank Recovery in Signal Processing, Statistics, and Machine Learning

An overview of nonconvex regularization based sparse and low-rank recovery in various fields in signal processing, statistics, and machine learning, including compressive sensing, sparse regression and variable selection, sparse signals separation, sparse principal component analysis (PCA), large covariance and inverse covariance matrices estimation, matrix completion, and robust PCA is given.