Corpus ID: 203593581

Additive Powers-of-Two Quantization: A Non-uniform Discretization for Neural Networks

@article{Li2019AdditivePQ,
  title={Additive Powers-of-Two Quantization: A Non-uniform Discretization for Neural Networks},
  author={Yuhang Li and Xin Dong and Wei Wang},
  journal={ArXiv},
  year={2019},
  volume={abs/1909.13144}
}
We proposed Additive Powers-of-Two~(APoT) quantization, an efficient non-uniform quantization scheme that attends to the bell-shaped and long-tailed distribution of weights in neural networks. By constraining all quantization levels as a sum of several Powers-of-Two terms, APoT quantization enjoys overwhelming efficiency of computation and a good match with weights' distribution. A simple reparameterization on clipping function is applied to generate better-defined gradient for updating of… Expand
Distribution-Aware Adaptive Multi-Bit Quantization
TLDR
A distribution-aware multi-bit quantization (DMBQ) method that incorporates the distribution prior into the optimization of quantization is proposed, and it is shown that this method not only outperforms state-of-the-art quantized networks in terms of accuracy but also is more efficient in terms the training time. Expand
Training with Quantization Noise for Extreme Fixed-Point Compression
TLDR
This paper proposes to only quantize a different random subset of weights during each forward, allowing for unbiased gradients to flow through the other weights, establishing new state-of-the-art compromises between accuracy and model size both in natural language processing and image classification. Expand
EXTREME MODEL COMPRESSION
We tackle the problem of producing compact models, maximizing their accuracy for a given model size. A standard solution is to train networks with Quantization Aware Training (Jacob et al., 2018),Expand
Layer Importance Estimation with Imprinting for Neural Network Quantization
TLDR
This work proposes an accuracy-aware criterion to quantify the layer’s importance rank and applies imprinting per layer which acts as a proxy module for accuracy estimation in an efficient way to draw better interpretability to the selected bit-width configuration. Expand
FAT: Learning Low-Bitwidth Parametric Representation via Frequency-Aware Transformation
TLDR
This work presents a novel quantization pipeline, Frequency-Aware Transformation (FAT), which learns to transform network weights in the frequency domain before quantization, making them more amenable to training in low bitwidth and improves both uniform and non-uniform quantizers. Expand
IMPROVING THE ACCURACY OF NEURAL NETWORKS IN ANALOG COMPUTING-IN-MEMORY SYSTEMS BY A GENERALIZED QUANTIZATION METHOD
  • 2020
Crossbar-enabled analog computing-in-memory (CACIM) systems can significantly improve the computation speed and energy efficiency of deep neural networks (DNNs). However, the transition of DNN fromExpand
Once Quantized for All: Progressively Searching for Quantized Efficient Models
TLDR
Once Quantized for All (OQA) is presented, a novel framework that searches for quantized efficient models and deploys their quantized weights at the same time without additional post-process without affecting the quality of the search-retrain schema. Expand
BatchQuant: Quantized-for-all Architecture Search with Robust Quantizer
TLDR
The approach is the first to seamlessly extend one-shot weight-sharing NAS supernet to support subnets with arbitrary ultra-low bitwidth mixed-precision quantization policies without retraining and opens up new possibilities in joint hardware-aware neural architecture search and quantization. Expand
Generalizable Mixed-Precision Quantization via Attribution Rank Preservation
TLDR
This paper observes that locating network attribution correctly is general ability for accurate visual analysis across different data distribution, and proposes a generalizable mixedprecision quantization method for efficient inference that obtains competitive accuracy-complexity trade-off compared with the state-of-the-art mixed-precision networks in significantly reduced search cost. Expand
BinaryBERT: Pushing the Limit of BERT Quantization
TLDR
BinaryBERT is proposed, which pushes BERT quantization to the limit with weight binarization and initializes the binary model by equivalent splitting from a half-sized ternary network, achieving the state-of-the-art results on GLUE and SQuAD benchmarks. Expand
...
1
2
...

References

SHOWING 1-10 OF 25 REFERENCES
Bridging the Accuracy Gap for 2-bit Quantized Neural Networks (QNN)
TLDR
This paper proposes novel techniques that target weight and activation quantizations separately resulting in an overall quantized neural network (QNN) that achieves state-of-the-art classification accuracy across a range of popular models and datasets. Expand
Deep Learning with Low Precision by Half-Wave Gaussian Quantization
TLDR
An half-wave Gaussian quantizer (HWGQ) is proposed for forward approximation and shown to have efficient implementation, by exploiting the statistics of of network activations and batch normalization operations, and to achieve much closer performance to full precision networks than previously available low-precision networks. Expand
Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks
TLDR
Differentiable Soft Quantization (DSQ) is proposed to bridge the gap between the full-precision and low-bit networks and can help pursue the accurate gradients in backward propagation, and reduce the quantization loss in forward process with an appropriate clipping range. Expand
PACT: Parameterized Clipping Activation for Quantized Neural Networks
TLDR
It is shown, for the first time, that both weights and activations can be quantized to 4-bits of precision while still achieving accuracy comparable to full precision networks across a range of popular models and datasets. Expand
LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
TLDR
This work proposes to jointly train a quantized, bit-operation-compatible DNN and its associated quantizers, as opposed to using fixed, handcrafted quantization schemes such as uniform or logarithmic quantization, to address the gap in prediction accuracy between the quantized model and the full-precision model. Expand
Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights
TLDR
Extensive experiments on the ImageNet classification task using almost all known deep CNN architectures including AlexNet, VGG-16, GoogleNet and ResNets well testify the efficacy of the proposed INQ, showing that at 5-bit quantization, models have improved accuracy than the 32-bit floating-point references. Expand
Learned Step Size Quantization
TLDR
This work introduces a novel means to estimate and scale the task loss gradient at each weight and activation layer's quantizer step size, such that it can be learned in conjunction with other network parameters. Expand
Learning to Quantize Deep Networks by Optimizing Quantization Intervals With Task Loss
TLDR
This work proposes a trainable quantizer that can be trained on a heterogeneous dataset, and thus can be used to quantize pretrained networks without access to their training data, and outperforms existing methods to achieve the state-of-the-art accuracy. Expand
Weight Normalization based Quantization for Deep Neural Network Compression
TLDR
Weight normalization based quantization (WNQ) is proposed, a novel quantization method for model compression that adopts weight normalization to avoid the long-tail distribution of network weights and subsequently reduces the quantization error. Expand
Convolutional Neural Networks using Logarithmic Data Representation
TLDR
This paper proposes a new data representation that enables state-of-the-art networks to be encoded to 3 bits with negligible loss in classification performance, and proposes an end-to-end training procedure that uses log representation at 5-bits, which achieves higher final test accuracy than linear at5-bits. Expand
...
1
2
3
...