• Corpus ID: 221802416

Searching for Low-Bit Weights in Quantized Neural Networks

  title={Searching for Low-Bit Weights in Quantized Neural Networks},
  author={Zhaohui Yang and Yunhe Wang and Kai Han and Chunjing Xu and Chao Xu and Dacheng Tao and Chang Xu},
Quantized neural networks with low-bit weights and activations are attractive for developing AI accelerators. However, the quantization functions used in most conventional quantization methods are non-differentiable, which increases the optimization difficulty of quantized networks. Compared with full-precision parameters (i.e., 32-bit floating numbers), low-bit values are selected from a much smaller set. For example, there are only 16 possibilities in 4-bit space. Thus, we present to regard… 

Tables from this paper

A Comprehensive Survey on Model Quantization for Deep Neural Networks
This paper describes the quantization concepts and categorize the methods from different perspectives, and compares the accuracy of previous methods with various bit-width for weights and activations on CIFAR-10 and the large-scale dataset, ImageNet.
A Survey of Quantization Methods for Efficient Neural Network Inference
This article surveys approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods.
A comprehensive review of Binary Neural Network
This paper focuses exclusively on 1-bit activations and weights networks, as opposed to previous surveys in which low-bit works are mixed in, and discusses potential directions and future research opportunities for the latest BNN algorithms and techniques.
F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization
This work presents F8Net, a novel quantization framework consisting of only fixed-point 8-bit multiplication, which achieves comparable and better performance, when compared not only to existing quantization techniques with INT32 multiplication or floating-point arithmetic, but also to the full-precision counterparts, achieving state-of-the-art performance.
S3: Sign-Sparse-Shift Reparametrization for Effective Training of Low-bit Shift Networks
This work proposes S 3 re-parameterization, a novel technique for training low-bit shift networks that decomposes a discrete parameter in a sign-sparse-shift 3-fold manner and shows 3- bit shift networks compete with their full-precision counterparts in terms of top-1 accuracy on ImageNet.
Learning Frequency Domain Approximation for Binary Neural Networks
This work proposes to estimate the gradient of sign function in the Fourier frequency domain using the combination of sine functions for training BNNs, namely frequency domain approximation (FDA), which achieves the state-of-the-art accuracy.
TRQ: Ternary Neural Networks With Residual Quantization
A stemresidual framework which provides new insight into ternary quantization, termed Ternary Residual Quantization (TRQ), to achieve more powerful TNNs and yields great recognition accuracy while being accelerated.
Layer-wise Searching for 1-bit Detectors
A layer-wise searching (LWS) strategy to generate 1-bit detectors that maintain a performance very close to the original real-valued model and introduces angular and amplitude loss functions to increase detector capacity.
Equal Bits: Enforcing Equally Distributed Binary Network Weights
It is shown that quantizing using optimal transport can guarantee any bit ratio, including equal ratios, and that the quantization method is effective when compared to state-of-the-art binarization methods, even when using binary weight pruning.
Whether the Support Region of Three-Bit Uniform Quantizer Has a Strong Impact on Post-Training Quantization for MNIST Dataset?
The results show that the choice of the support region threshold value of the three- bit uniform quantizer does not have such a strong impact on the accuracy of the QNNs, which is not the case with two-bit uniform post-training quantization, when applied in MLP for the same classification task.


SYQ: Learning Symmetric Quantization for Efficient Deep Neural Networks
It is shown that symmetric quantization can substantially improve accuracy for networks with extremely low-precision weights and activations, and it is demonstrated that this representation imposes minimal or no hardware implications to more coarse-grained approaches.
PACT: Parameterized Clipping Activation for Quantized Neural Networks
It is shown, for the first time, that both weights and activations can be quantized to 4-bits of precision while still achieving accuracy comparable to full precision networks across a range of popular models and datasets.
Trained Ternary Quantization
This work proposes Trained Ternary Quantization (TTQ), a method that can reduce the precision of weights in neural networks to ternary values to improve the accuracy of some models (32, 44, 56-layer ResNet) on CIFAR-10 and AlexNet on ImageNet.
Binarized Neural Networks for Resource-Efficient Hashing with Minimizing Quantization Loss
This paper proposes a novel learning binary neural network framework to achieve a resource-efficient deep hashing, and provides two theories to demonstrate the necessity and effectiveness of minimizing the quantization losses for both weights and activations.
BinaryConnect: Training Deep Neural Networks with binary weights during propagations
BinaryConnect is introduced, a method which consists in training a DNN with binary weights during the forward and backward propagations, while retaining precision of the stored weights in which gradients are accumulated, and near state-of-the-art results with BinaryConnect are obtained on the permutation-invariant MNIST, CIFAR-10 and SVHN.
Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks
Differentiable Soft Quantization (DSQ) is proposed to bridge the gap between the full-precision and low-bit networks and can help pursue the accurate gradients in backward propagation, and reduce the quantization loss in forward process with an appropriate clipping range.
LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
This work proposes to jointly train a quantized, bit-operation-compatible DNN and its associated quantizers, as opposed to using fixed, handcrafted quantization schemes such as uniform or logarithmic quantization, to address the gap in prediction accuracy between the quantized model and the full-precision model.
Proximal Mean-Field for Neural Network Quantization
This work designs an efficient iterative optimization procedure that involves stochastic gradient descent followed by a projection and proves that this simple projected gradient descent approach is, in fact, equivalent to a proximal version of the well-known mean-field method.
DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
DoReFa-Net, a method to train convolutional neural networks that have low bitwidth weights and activations using low bit width parameter gradients, is proposed and can achieve comparable prediction accuracy as 32-bit counterparts.
Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit?
  • Shilin Zhu, Xin Dong, Hao Su
  • Computer Science
    2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2019
The Binary Ensemble Neural Network (BENN) is proposed, which leverages ensemble methods to improve the performance of BNNs with limited efficiency cost and can even surpass the accuracy of the full-precision floating number network with the same architecture.