Two-Step Quantization for Low-bit Neural Networks

@article{Wang2018TwoStepQF,
  title={Two-Step Quantization for Low-bit Neural Networks},
  author={Peisong Wang and Qinghao Hu and Yifan Zhang and Chunjie Zhang and Yang Liu and Jian Cheng},
  journal={2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2018},
  pages={4376-4384}
}
Every bit matters in the hardware design of quantized neural networks. [...] Key Method In this paper, we propose a simple yet effective Two-Step Quantization (TSQ) framework, by decomposing the network quantization problem into two steps: code learning and transformation function learning based on the learned codes. For the first step, we propose the sparse quantization method for code learning.Expand
Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks
TLDR
Differentiable Soft Quantization (DSQ) is proposed to bridge the gap between the full-precision and low-bit networks and can help pursue the accurate gradients in backward propagation, and reduce the quantization loss in forward process with an appropriate clipping range. Expand
Low-bit Quantization Needs Good Distribution
TLDR
Scale-Clip technique is proposed, a Distribution Reshaping technique that can reshape weights or activations into a uniform-like distribution in a dynamic manner that achieves much better performance than state-of-the- art quantization methods. Expand
Unsupervised Network Quantization via Fixed-Point Factorization
TLDR
This article proposes an efficient framework, namely, fixed-point factorized network (FFN), to turn all weights into ternary values, i.e., {−1, 0, 1}, and highlights that the proposed FFN framework can achieve negligible degradation even without any supervised retraining on the labeled data. Expand
Towards Unified INT8 Training for Convolutional Neural Network
TLDR
An attempt to build a unified 8-bit (INT8) training framework for common convolutional neural networks from the aspects of both accuracy and speed is given and two universal techniques are proposed that reduce the direction deviation of gradients and avoid illegal gradient update along the wrong direction. Expand
FAT: Learning Low-Bitwidth Parametric Representation via Frequency-Aware Transformation
TLDR
This work presents a novel quantization pipeline, Frequency-Aware Transformation (FAT), which learns to transform network weights in the frequency domain before quantization, making them more amenable to training in low bitwidth and improves both uniform and non-uniform quantizers. Expand
Network Quantization with Element-wise Gradient Scaling
TLDR
An element-wise gradient scaling (EWGS) is proposed, a simple yet effective alternative to the straight-through estimator, training a quantized network better than the STE in terms of stability and accuracy. Expand
Learning to Quantize Deep Networks by Optimizing Quantization Intervals With Task Loss
TLDR
This work proposes a trainable quantizer that can be trained on a heterogeneous dataset, and thus can be used to quantize pretrained networks without access to their training data, and outperforms existing methods to achieve the state-of-the-art accuracy. Expand
COMPUTATION-EFFICIENT QUANTIZATION METHOD
  • 2018
Deep Neural Networks, being memory and computation intensive, are a challenge to deploy in smaller devices. Numerous quantization techniques have been proposed to reduce the inference latency/memoryExpand
Elastic Significant Bit Quantization and Acceleration for Deep Neural Networks
  • Cheng Gong, Ye Lu, Kunpeng Xie, Zongming Jin, Tao Li, Yanzhi Wang
  • Computer Science, Engineering
  • ArXiv
  • 2021
Quantization has been proven to be a vital method for improving the inference efficiency of deep neural networks (DNNs). However, it is still challenging to strike a good balance between accuracy andExpand
Kernel Quantization for Efficient Network Compression
TLDR
This paper proposes to quantize in both kernel and weight level, and can represent the weight tensor in the convolution layer with low-bit indexes and a kernel codebook with limited size, which enables KQ to achieve significant compression ratio. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 40 REFERENCES
Deep Learning with Low Precision by Half-Wave Gaussian Quantization
TLDR
An half-wave Gaussian quantizer (HWGQ) is proposed for forward approximation and shown to have efficient implementation, by exploiting the statistics of of network activations and batch normalization operations, and to achieve much closer performance to full precision networks than previously available low-precision networks. Expand
Trained Ternary Quantization
TLDR
This work proposes Trained Ternary Quantization (TTQ), a method that can reduce the precision of weights in neural networks to ternary values to improve the accuracy of some models (32, 44, 56-layer ResNet) on CIFAR-10 and AlexNet on ImageNet. Expand
Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights
TLDR
Extensive experiments on the ImageNet classification task using almost all known deep CNN architectures including AlexNet, VGG-16, GoogleNet and ResNets well testify the efficacy of the proposed INQ, showing that at 5-bit quantization, models have improved accuracy than the 32-bit floating-point references. Expand
DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
TLDR
DoReFa-Net, a method to train convolutional neural networks that have low bitwidth weights and activations using low bit width parameter gradients, is proposed and can achieve comparable prediction accuracy as 32-bit counterparts. Expand
How to Train a Compact Binary Neural Network with High Accuracy?
TLDR
The findings first reveal that a low learning rate is highly preferred to avoid frequent sign changes of the weights, which often makes the learning of BinaryNets unstable, and a regularization term is introduced that encourages the weights to be bipolar. Expand
From Hashing to CNNs: Training BinaryWeight Networks via Hashing
TLDR
The strong connection between inner-product preserving hashing and binary weight networks can be intrinsically regarded as a hashing problem, and an alternating optimization method to learn the hash codes instead of directly learning binary weights is proposed. Expand
BinaryConnect: Training Deep Neural Networks with binary weights during propagations
TLDR
BinaryConnect is introduced, a method which consists in training a DNN with binary weights during the forward and backward propagations, while retaining precision of the stored weights in which gradients are accumulated, and near state-of-the-art results with BinaryConnect are obtained on the permutation-invariant MNIST, CIFAR-10 and SVHN. Expand
Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications
TLDR
A simple and effective scheme to compress the entire CNN, called one-shot whole network compression, which addresses the important implementation level issue on 1?1 convolution, which is a key operation of inception module of GoogLeNet as well as CNNs compressed by the proposed scheme. Expand
Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition
TLDR
A simple two-step approach for speeding up convolution layers within large convolutional neural networks based on tensor decomposition and discriminative fine-tuning is proposed, leading to higher obtained CPU speedups at the cost of lower accuracy drops for the smaller of the two networks. Expand
Deep Learning with Limited Numerical Precision
TLDR
The results show that deep networks can be trained using only 16-bit wide fixed-point number representation when using stochastic rounding, and incur little to no degradation in the classification accuracy. Expand
...
1
2
3
4
...