Quantized CNN: A Unified Approach to Accelerate and Compress Convolutional Networks

@article{Cheng2018QuantizedCA,
  title={Quantized CNN: A Unified Approach to Accelerate and Compress Convolutional Networks},
  author={Jian Cheng and Jiaxiang Wu and Cong Leng and Yuhang Wang and Qinghao Hu},
  journal={IEEE Transactions on Neural Networks and Learning Systems},
  year={2018},
  volume={29},
  pages={4730-4743}
}
We are witnessing an explosive development and widespread application of deep neural networks (DNNs) in various fields. [...] Key Method Guided by minimizing the approximation error of individual layer’s response, both fully connected and convolutional layers are carefully quantized. The inference computation can be effectively carried out on the quantized network, with much lower memory and storage consumption. Quantitative evaluation on two publicly available benchmarks demonstrates the promising performance…Expand
Learning Compression from Limited Unlabeled Data
TLDR
By employing the re-estimated statistics in batch normalization, this paper significantly improves the accuracy of compressed CNNs and is comparable to label-based methods. Expand
Recent advances in efficient computation of deep convolutional neural networks
TLDR
A comprehensive survey of recent advances in network acceleration, compression, and accelerator design from both algorithm and hardware points of view is provided. Expand
Unsupervised Network Quantization via Fixed-Point Factorization
TLDR
This article proposes an efficient framework, namely, fixed-point factorized network (FFN), to turn all weights into ternary values, i.e., {−1, 0, 1}, and highlights that the proposed FFN framework can achieve negligible degradation even without any supervised retraining on the labeled data. Expand
Convolutional Neural Network Accelerator with Vector Quantization
TLDR
This paper introduces a DNN accelerator based on a model compression technique vector quantization (VQ), which can reduce the network model size and computation cost simultaneously and achieves 4.2 times reduction in memory access and 2.05 times throughput per cycle for batch-one inference. Expand
A comprehensive survey on model compression and acceleration
TLDR
A survey of various techniques suggested for compressing and accelerating the ML and DL models is presented and the challenges of the existing techniques are discussed and future research directions in the field are provided. Expand
On-Device Partial Learning Technique of Convolutional Neural Network for New Classes
TLDR
This paper proposes an on-device partial learning technique that does not require additional neural network structures, and reduces unnecessary computation overhead, and select a subset of influential weights from a trained network to accommodate the new classification class. Expand
Toward Compact ConvNets via Structure-Sparsity Regularized Filter Pruning
TLDR
This paper proposes a novel filter pruning scheme, termed structured sparsity regularization (SSR), to simultaneously speed up the computation and reduce the memory overhead of CNNs, which can be well supported by various off-the-shelf deep learning libraries. Expand
Training Lightweight Deep Convolutional Neural Networks Using Bag-of-Features Pooling
  • N. Passalis, A. Tefas
  • Computer Science, Medicine
  • IEEE Transactions on Neural Networks and Learning Systems
  • 2019
TLDR
The proposed quantization-based pooling method is inspired from the bag-of-features model and can be used for learning more lightweight deep neural networks, leading to an end-to-end trainable CNN architecture. Expand
A Low Effort Approach to Structured CNN Design Using PCA
TLDR
This work proposes a method to analyze a trained network and deduce an optimized, compressed architecture that preserves accuracy while keeping computational costs tractable, by using PCA (Principal Component Analysis). Expand
Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey
TLDR
This article reviews the mainstream compression approaches such as compact model, tensor decomposition, data quantization, and network sparsification, and answers the question of how to leverage these methods in the design of neural network accelerators and present the state-of-the-art hardware architectures. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 58 REFERENCES
Quantized Convolutional Neural Networks for Mobile Devices
TLDR
This paper proposes an efficient framework, namely Quantized CNN, to simultaneously speed-up the computation and reduce the storage and memory overhead of CNN models. Expand
Accelerating Convolutional Neural Networks for Mobile Applications
TLDR
An efficient and effective approach is proposed to accelerate the test-phase computation of CNNs based on low-rank and group sparse tensor decomposition, which achieves significant reduction in computational complexity, at the cost of negligible loss in accuracy. Expand
Compressing Deep Convolutional Networks using Vector Quantization
TLDR
This paper is able to achieve 16-24 times compression of the network with only 1% loss of classification accuracy using the state-of-the-art CNN, and finds in terms of compressing the most storage demanding dense connected layers, vector quantization methods have a clear gain over existing matrix factorization methods. Expand
Accelerating Very Deep Convolutional Networks for Classification and Detection
TLDR
This paper aims to accelerate the test-time computation of convolutional neural networks, especially very deep CNNs, and develops an effective solution to the resulting nonlinear optimization problem without the need of stochastic gradient descent (SGD). Expand
Compressing Convolutional Neural Networks
TLDR
This work presents a novel network architecture, Frequency-Sensitive Hashed Nets (FreshNets), which exploits inherent redundancy in both convolutional layers and fully-connected layers of a deep learning model, leading to dramatic savings in memory and storage consumption. Expand
Speeding up Convolutional Neural Networks with Low Rank Expansions
TLDR
Two simple schemes for drastically speeding up convolutional neural networks are presented, achieved by exploiting cross-channel or filter redundancy to construct a low rank basis of filters that are rank-1 in the spatial domain. Expand
Convolutional neural networks with low-rank regularization
TLDR
A new algorithm for computing the low-rank tensor decomposition for removing the redundancy in the convolution kernels and is more effective than iterative methods for speeding up large CNNs. Expand
Convolutional Neural Networks using Logarithmic Data Representation
TLDR
This paper proposes a new data representation that enables state-of-the-art networks to be encoded to 3 bits with negligible loss in classification performance, and proposes an end-to-end training procedure that uses log representation at 5-bits, which achieves higher final test accuracy than linear at5-bits. Expand
Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding
TLDR
This work introduces "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy. Expand
Return of the Devil in the Details: Delving Deep into Convolutional Nets
TLDR
It is shown that the data augmentation techniques commonly applied to CNN-based methods can also be applied to shallow methods, and result in an analogous performance boost, and it is identified that the dimensionality of the CNN output layer can be reduced significantly without having an adverse effect on performance. Expand
...
1
2
3
4
5
...