Quantized Convolutional Neural Networks for Mobile Devices

@article{Wu2016QuantizedCN,
  title={Quantized Convolutional Neural Networks for Mobile Devices},
  author={Jiaxiang Wu and Cong Leng and Yuhang Wang and Qinghao Hu and Jian Cheng},
  journal={2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2016},
  pages={4820-4828}
}
  • Jiaxiang Wu, Cong Leng, Jian Cheng
  • Published 21 December 2015
  • Computer Science
  • 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Recently, convolutional neural networks (CNN) have demonstrated impressive performance in various computer vision tasks. However, high performance hardware is typically indispensable for the application of CNN models due to the high computation complexity, which prohibits their further extensions. In this paper, we propose an efficient framework, namely Quantized CNN, to simultaneously speed-up the computation and reduce the storage and memory overhead of CNN models. Both filter kernels in… 
Hybrid Approach for Efficient Quantization of Weights in Convolutional Neural Networks
  • Sang-Il Seo, Juntae Kim
  • Computer Science
    2018 IEEE International Conference on Big Data and Smart Computing (BigComp)
  • 2018
TLDR
This paper quantizes the weights of AlexNet without a large drop in accuracy by using a hybrid quantizer using uniform quantizer and k-means clustering.
Accelerating Convolutional Neural Networks for Mobile Applications
TLDR
An efficient and effective approach is proposed to accelerate the test-phase computation of CNNs based on low-rank and group sparse tensor decomposition, which achieves significant reduction in computational complexity, at the cost of negligible loss in accuracy.
Quantized CNN: A Unified Approach to Accelerate and Compress Convolutional Networks
TLDR
A quantized CNN is presented, a unified approach to accelerate and compress convolutional networks, guided by minimizing the approximation error of individual layer’s response, both fully connected and Convolutional layers are carefully quantized.
Accelerator Design for Vector Quantized Convolutional Neural Network
TLDR
This paper proposes an accelerator architecture for quantized CNN, based on algorithm-architecture-co-exploration, and designs a high-throughput processing element architecture to accelerate quantized layers.
Modeling the Resource Requirements of Convolutional Neural Networks on Mobile Devices
TLDR
The resource requirements (time, memory) of CNNs on mobile devices are understood and the modeling tool, Augur, is built, which takes a CNN configuration (descriptor) as the input and estimates the compute time and resource usage of the CNN, to give insights about whether and how efficiently a CNN can be run on a given mobile platform.
Methodologies of Compressing a Stable Performance Convolutional Neural Networks in Image Classification
TLDR
The main goals are memory compression and complexity reduction (both operations and cycles reduction) of CNNs, using methods (including quantization and pruning) that don’t require retraining (i.e., allowing them in mobile system, or robots).
Accelerating Convolutional Neural Networks for Continuous Mobile Vision via Cache Reuse
TLDR
This paper proposes a transparent caching mechanism, named CNNCache, that can substantially accelerate CNN-driven mobile continuous vision tasks without any efforts from app developers and implements a prototype of CNNCache to run on commodity Android devices, and evaluates it via typical CNN models.
Space Efficient Quantization for Deep Convolutional Neural Networks
TLDR
This article proposes a space efficient quantization scheme which uses eight or less bits to represent the original 32-bit weights and adopts singular value decomposition (SVD) method to decrease the parameter size of fully-connected layers for further compression.
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
TLDR
An extremely computation-efficient CNN architecture named ShuffleNet is introduced, which is designed specially for mobile devices with very limited computing power (e.g., 10-150 MFLOPs), to greatly reduce computation cost while maintaining accuracy.
Augur: Modeling the Resource Requirements of ConvNets on Mobile Devices
TLDR
This paper measures and analyzes the performance and resource usage for the CNNs on different mobile CPUs and GPUs, and builds and evaluates the modeling tool, Augur, which takes a CNN configuration as the input and estimates the compute time, memory, and power requirements of the CNN to give insights about whether and how efficiently a CNN can be run on a given mobile platform.
...
...

References

SHOWING 1-10 OF 59 REFERENCES
Compressing Deep Convolutional Networks using Vector Quantization
TLDR
This paper is able to achieve 16-24 times compression of the network with only 1% loss of classification accuracy using the state-of-the-art CNN, and finds in terms of compressing the most storage demanding dense connected layers, vector quantization methods have a clear gain over existing matrix factorization methods.
Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation
TLDR
Using large state-of-the-art models, this work demonstrates speedups of convolutional layers on both CPU and GPU by a factor of 2 x, while keeping the accuracy within 1% of the original model.
Speeding up Convolutional Neural Networks with Low Rank Expansions
TLDR
Two simple schemes for drastically speeding up convolutional neural networks are presented, achieved by exploiting cross-channel or filter redundancy to construct a low rank basis of filters that are rank-1 in the spatial domain.
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
TLDR
The Binary-Weight-Network version of AlexNet is compared with recent network binarization methods, BinaryConnect and BinaryNets, and outperform these methods by large margins on ImageNet, more than \(16\,\%\) in top-1 accuracy.
Efficient and accurate approximations of nonlinear convolutional networks
TLDR
This paper aims to accelerate the test-time computation of deep convolutional neural networks (CNNs), and takes the nonlinear units into account, subject to a low-rank constraint which helps to reduce the complexity of filters.
ImageNet classification with deep convolutional neural networks
TLDR
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
Accelerating Very Deep Convolutional Networks for Classification and Detection
TLDR
This paper aims to accelerate the test-time computation of convolutional neural networks, especially very deep CNNs, and develops an effective solution to the resulting nonlinear optimization problem without the need of stochastic gradient descent (SGD).
High-Performance Neural Networks for Visual Object Classification
We present a fast, fully parameterizable GPU implementation of Convolutional Neural Network variants. Our feature extractors are neither carefully designed nor pre-wired, but rather learned in a
Very Deep Convolutional Networks for Large-Scale Image Recognition
TLDR
This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Caffe: Convolutional Architecture for Fast Feature Embedding
TLDR
Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
...
...