• Corpus ID: 199528493

Efficient Inference of CNNs via Channel Pruning

  title={Efficient Inference of CNNs via Channel Pruning},
  author={Boyu Zhang and Azadeh Davoodi and Yu Hen Hu},
The deployment of Convolutional Neural Networks (CNNs) on resource constrained platforms such as mobile devices and embedded systems has been greatly hindered by their high implementation cost, and thus motivated a lot research interest in compressing and accelerating trained CNN models. Among various techniques proposed in literature, structured pruning, especially channel pruning, has gain a lot focus due to 1) its superior performance in memory, computation, and energy reduction; and 2) it… 

Figures and Tables from this paper

Finding Storage- and Compute-Efficient Convolutional Neural Networks

This work provides a general framework which renders efficient CNN representations that solve given classification tasks to specified quality levels and shows the advantages of EC2T compared to the standard in ternary quantization, Trained Ternary Quantization (TTQ), and set new benchmarks in this research area.

SLID: Exploiting Spatial Locality in Input Data as a Computational Reuse Method for Efficient CNN

Spatial Locality Input Data (SLID) is the first work to exploit the input spatial locality for savings on CNN convolution operations with minimal accuracy loss and without memory or computational overhead.

S-DenseNet: A DenseNet Compression Model Based on Convolution Grouping Strategy Using Skyline Method

S-DenseNet is proposed, a compact model of DenseNet which makes the extracting features of DensingNet more comprehensively and reduces the parameter redundancy at the same time, and achieves the higher or similar Top-1 accuracy with less complexity on ImageNet dataset.

FedSCR: Structure-Based Communication Reduction for Federated Learning

A structure-based communication reduction algorithm, called FedSCR, that reduces the number of parameters transported through the network while maintaining the model accuracy, and an adaptive FedSCRs, that dynamically changes the bounded threshold, to enhance the model robustness on the Non-IID data are proposed.



Learning Efficient Convolutional Networks through Network Slimming

The approach is called network slimming, which takes wide and large networks as input models, but during training insignificant channels are automatically identified and pruned afterwards, yielding thin and compact models with comparable accuracy.

Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications

A simple and effective scheme to compress the entire CNN, called one-shot whole network compression, which addresses the important implementation level issue on 1?1 convolution, which is a key operation of inception module of GoogLeNet as well as CNNs compressed by the proposed scheme.

Pruning Filters for Efficient ConvNets

This work presents an acceleration method for CNNs, where it is shown that even simple filter pruning techniques can reduce inference costs for VGG-16 and ResNet-110 by up to 38% on CIFAR10 while regaining close to the original accuracy by retraining the networks.

ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression

ThiNet is proposed, an efficient and unified framework to simultaneously accelerate and compress CNN models in both training and inference stages, and it is revealed that it needs to prune filters based on statistics information computed from its next layer, not the current layer, which differentiates ThiNet from existing methods.

Learning Structured Sparsity in Deep Neural Networks

The results show that for CIFAR-10, regularization on layer depth can reduce 20 layers of a Deep Residual Network to 18 layers while improve the accuracy from 91.25% to 92.60%, which is still slightly higher than that of original ResNet with 32 layers.

Channel Pruning for Accelerating Very Deep Neural Networks

  • Yihui HeX. ZhangJian Sun
  • Computer Science
    2017 IEEE International Conference on Computer Vision (ICCV)
  • 2017
This paper proposes an iterative two-step algorithm to effectively prune each layer, by a LASSO regression based channel selection and least square reconstruction, and generalizes this algorithm to multi-layer and multi-branch cases.

Dynamic Network Surgery for Efficient DNNs

A novel network compression method called dynamic network surgery, which can remarkably reduce the network complexity by making on-the-fly connection pruning by proving that it outperforms the recent pruning method by considerable margins.

Accelerating Convolutional Networks via Global & Dynamic Filter Pruning

This paper proposes a novel global & dynamic pruning (GDP) scheme to prune redundant filters for CNN acceleration that achieves superior performance to accelerate several cutting-edge CNNs on the ILSVRC 2012 benchmark.

Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks

The proposed Soft Filter Pruning (SFP) method enables the pruned filters to be updated when training the model after pruning, which has two advantages over previous works: larger model capacity and less dependence on the pretrained model.

Data-Driven Sparse Structure Selection for Deep Neural Networks

A simple and effective framework to learn and prune deep models in an end-to-end manner by adding sparsity regularizations on factors, and solving the optimization problem by a modified stochastic Accelerated Proximal Gradient (APG) method.