Exploring the Granularity of Sparsity in Convolutional Neural Networks

@article{Mao2017ExploringTG,
  title={Exploring the Granularity of Sparsity in Convolutional Neural Networks},
  author={Huizi Mao and Song Han and Jeff Pool and Wenshuo Li and Xingyu Liu and Yu Wang and William J. Dally},
  journal={2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
  year={2017},
  pages={1927-1934}
}
  • Huizi Mao, Song Han, W. Dally
  • Published 1 July 2017
  • Computer Science
  • 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Sparsity helps reducing the computation complexity of DNNs by skipping the multiplication with zeros. [] Key Result Our analysis, which is based on the framework of a recent sparse convolutional neural network (SCNN) accelerator, further demonstrates that it saves 30% – 35% of memory references compared with fine-grained sparsity.

Figures and Tables from this paper

Regularization-Free Structural Pruning for GPU Inference Acceleration

TLDR
This paper proposes a regularization-free structural pruning scheme to take advantage of both unstructured and structural pruned by heuristically mixing vector-wise fine-grained and block-wise coarse- grained pruning masks with an AND operation.

Accelerating Sparse Deep Neural Networks

TLDR
The design and behavior of Sparse Tensor Cores are presented, which exploit a 2:4 (50%) sparsity pattern that leads to twice the math throughput of dense matrix units.

Exploiting Weight Redundancy in CNNs: Beyond Pruning and Quantization

TLDR
This paper identifies another form of redundancy in CNN weight tensors, in the form of repeated patterns of similar values, and investigates several compression schemes to take advantage of this structure inCNN weight data, including multiple forms of Huffman coding, and other approaches inspired by block sparse matrix formats.

Campfire: Compressible, Regularization-Free, Structured Sparse Training for Hardware Accelerators

TLDR
The results show that with 70% target sparsity, over 75% top-1 accuracy is achievable and the proposed training methodology Campfire explores pruning at granularities within a convolutional kernel and filter.

Joint Regularization on Activations and Weights for Efficient Neural Network Pruning

TLDR
The derived deep sparsification of JPnet reveals more optimization space for the existing DNN accelerators dedicated for sparse matrix operations and thoroughly evaluates the effectiveness of joint regularization through various network models with different activation functions and on different datasets.

Discriminative Layer Pruning for Convolutional Neural Networks

TLDR
It is shown that a simple subspace projection approach can be employed to estimate the importance of network layers, enabling the pruning of CNNs to a resource-efficient depth within a given network size constraint, and that cascading discriminative layer pruning with filter-oriented pruning improves the resource-efficiency of the resulting network.

Data Stream Oriented Fine-grained Sparse CNN Accelerator with Efficient Unstructured Pruning Strategy

TLDR
An unstructured fine-grained pruning strategy is proposed and achieves a 16X compression ratio with a top-1 accuracy loss of 1.4% for VGG-16, and a light-weight, high-performance sparse CNN accelerator with modified systolic array is proposed for pruned V GG-16.

PCNN: Pattern-based Fine-Grained Regular Pruning Towards Optimizing CNN Accelerators

TLDR
A novel index format called Sparsity Pattern Mask (SPM) is presented to encode the sparsity in PCNN, a fine-grained regular 1D pruning method that achieves the compression rate up to 8.4× with only 0.2% accuracy loss.

1xN Pattern for Pruning Convolutional Neural Networks.

TLDR
This paper proposes a novel 1×N pruning pattern that prunes consecutive N output kernels with the same input channel index into one block, which serves as a basic pruning granularity of thisPruning pattern.

Flexible group-level pruning of deep neural networks for fast inference on mobile CPUs: work-in-progress

TLDR
A novel group-level pruning method to accelerate deep neural networks on mobile GPUs, where several adjacent weights are pruned in a group while providing high accuracy.
...

References

SHOWING 1-10 OF 31 REFERENCES

Structured Pruning of Deep Convolutional Neural Networks

TLDR
The proposed work shows that when pruning granularities are applied in combination, the CIFAR-10 network can be pruned by more than 70% with less than a 1% loss in accuracy.

Learning Structured Sparsity in Deep Neural Networks

TLDR
The results show that for CIFAR-10, regularization on layer depth can reduce 20 layers of a Deep Residual Network to 18 layers while improve the accuracy from 91.25% to 92.60%, which is still slightly higher than that of original ResNet with 32 layers.

Pruning Filters for Efficient ConvNets

TLDR
This work presents an acceleration method for CNNs, where it is shown that even simple filter pruning techniques can reduce inference costs for VGG-16 and ResNet-110 by up to 38% on CIFAR10 while regaining close to the original accuracy by retraining the networks.

Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition

TLDR
A simple two-step approach for speeding up convolution layers within large convolutional neural networks based on tensor decomposition and discriminative fine-tuning is proposed, leading to higher obtained CPU speedups at the cost of lower accuracy drops for the smaller of the two networks.

Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding

TLDR
This work introduces "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy.

EIE: Efficient Inference Engine on Compressed Deep Neural Network

  • Song HanXingyu Liu W. Dally
  • Computer Science
    2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)
  • 2016
TLDR
An energy efficient inference engine (EIE) that performs inference on this compressed network model and accelerates the resulting sparse matrix-vector multiplication with weight sharing and is 189x and 13x faster when compared to CPU and GPU implementations of the same DNN without compression.

Pruning Convolutional Neural Networks for Resource Efficient Transfer Learning

TLDR
A new criterion based on an efficient first-order Taylor expansion to approximate the absolute change in training cost induced by pruning a network component is proposed, demonstrating superior performance compared to other criteria, such as the norm of kernel weights or average feature map activation.

Improving the speed of neural networks on CPUs

TLDR
This paper uses speech recognition as an example task, and shows that a real-time hybrid hidden Markov model / neural network (HMM/NN) large vocabulary system can be built with a 10× speedup over an unoptimized baseline and a 4× speed up over an aggressively optimized floating-point baseline at no cost in accuracy.

Dynamic Network Surgery for Efficient DNNs

TLDR
A novel network compression method called dynamic network surgery, which can remarkably reduce the network complexity by making on-the-fly connection pruning by proving that it outperforms the recent pruning method by considerable margins.

LCNN: Lookup-Based Convolutional Neural Network

TLDR
This paper introduces LCNN, a lookup-based convolutional neural network that encodes convolutions by few lookups to a dictionary that is trained to cover the space of weights in CNNs and shows the benefits of LCNN in few-shot learning and few-iteration learning, two crucial aspects of on-device training of deep learning models.