# Inducing and Exploiting Activation Sparsity for Fast Neural Network Inference

@inproceedings{Kurtz2020InducingAE, title={Inducing and Exploiting Activation Sparsity for Fast Neural Network Inference}, author={Mark Kurtz and Justin Kopinsky and Rati Gelashvili and Alexander Matveev and John Carr and Michael Goin and William M. Leiserson and Bill Nell and Nir Shavit and Dan Alistarh}, year={2020} }

Optimizing deep neural networks for inference has recently become an extremely active area of research. One of the go-to solutions in this context is weight pruning, which aims to reduce computational and memory footprint by removing large subsets of the connections in a neural network. Surprisingly, much less attention has been given to exploiting sparsity in the activation maps, which tend to be naturally sparse in many settings thanks to the structure of rectified linear (ReLU) activation…

## 6 Citations

### Training for temporal sparsity in deep neural networks, application in video processing

- Computer ScienceArXiv
- 2021

A new DNN layer is introduced, called Delta Activation Layer, whose sole purpose is to promote temporal sparsity of activations during training, and is implemented as an extension of the standard Tensoflow-Keras library, and applied to train deep neural networks on the Human Action Recognition dataset.

### Sparse Weight Activation Training

- Computer ScienceNeurIPS
- 2020

Sarse Weight Activation Training (SWAT), an algorithm that embodies these observations, is proposed that reduces computations by 50% to 80% with better accuracy at a given level of sparsity versus the Dynamic Sparse Graph algorithm.

### Locally Sparse Neural Networks for Tabular Biomedical Data

- Computer ScienceICML
- 2022

This work designs a locally sparse neural network where the local sparsity is learned to identify the subset of most relevant features for each sample, and reduces model overfitting in low-sample-size data and obtains an interpretable model.

### Neural Decoding With Optimization of Node Activations

- Computer ScienceIEEE Communications Letters
- 2022

It is shown that the neural decoder can be improved with two novel loss terms on the node’s activations, which has the same run time complexity and model size as the neural Belief Propagation decoder, while improving the decoding performance by up to up to 1.1dB on BCH codes.

### Improved Projection Learning for Lower Dimensional Feature Maps

- Computer ScienceArXiv
- 2022

This work explores an improved method for compressing all feature maps of pre-trained CNNs to below a speciﬁed limit by means of learned projections trained via end-to-end ﬁnetuning, which can then be folded and fused into the pre- trained network.

### Implicit Regularization of SGD via Thermophoresis

- Computer Science
- 2020

There exists an effective entropic force from SGD that pushes to reduce the gradient variance and this effect is proportional to squared learning rate and inverse batch size, and is more effective during the early phase of training when the model’s predictions are poor.

## References

SHOWING 1-10 OF 40 REFERENCES

### DeepHoyer: Learning Sparser Neural Network with Differentiable Scale-Invariant Sparsity Measures

- Computer ScienceICLR
- 2020

DeepHoyer is presented, a set of sparsity-inducing regularizers that are both differentiable almost everywhere and scale-invariant, and can be applied to both element-wise and structural pruning.

### Exploiting the input sparsity to accelerate deep neural networks: poster

- Computer SciencePPoPP
- 2019

This paper proposes an end-to-end optimization pipeline to generate programs for the inference with sparse input that contains both domain-specific and general optimization techniques and is capable of generating efficient code without relying on the off-the-shelf libraries.

### The State of Sparsity in Deep Neural Networks

- Computer ScienceArXiv
- 2019

It is shown that unstructured sparse architectures learned through pruning cannot be trained from scratch to the same test set performance as a model trained with joint sparsification and optimization, and the need for large-scale benchmarks in the field of model compression is highlighted.

### Accelerating Convolutional Neural Networks via Activation Map Compression

- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019

A three-stage compression and acceleration pipeline that sparsifies, quantizes and entropy encodes activation maps of Convolutional Neural Networks is proposed, leading to both acceleration of inference and higher model accuracy.

### Faster CNNs with Direct Sparse Convolutions and Guided Pruning

- Computer ScienceICLR
- 2017

An efficient general sparse-with-dense matrix multiplication implementation that is applicable to convolution of feature maps with kernels of arbitrary sparsity patterns and a performance model that predicts sweet spots of sparsity levels for different layers and on different computer architectures are developed.

### To prune, or not to prune: exploring the efficacy of pruning for model compression

- Computer ScienceICLR
- 2018

Across a broad range of neural network architectures, large-sparse models are found to consistently outperform small-dense models and achieve up to 10x reduction in number of non-zero parameters with minimal loss in accuracy.

### Pruning Filters for Efficient ConvNets

- Computer ScienceICLR
- 2017

This work presents an acceleration method for CNNs, where it is shown that even simple filter pruning techniques can reduce inference costs for VGG-16 and ResNet-110 by up to 38% on CIFAR10 while regaining close to the original accuracy by retraining the networks.

### Learning Activation Functions to Improve Deep Neural Networks

- Computer ScienceICLR
- 2015

A novel form of piecewise linear activation function that is learned independently for each neuron using gradient descent is designed, achieving state-of-the-art performance on CIFar-10, CIFAR-100, and a benchmark from high-energy physics involving Higgs boson decay modes.

### Learning both Weights and Connections for Efficient Neural Network

- Computer ScienceNIPS
- 2015

A method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections, and prunes redundant connections using a three-step method.

### WRPN: Wide Reduced-Precision Networks

- Computer ScienceICLR
- 2018

This work reduces the precision of activation maps (along with model parameters) and increase the number of filter maps in a layer, and finds that this scheme matches or surpasses the accuracy of the baseline full-precision network.