• Corpus ID: 235352529

Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution

  title={Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution},
  author={Zhaoyang Zhang and Wenqi Shao and Jinwei Gu and Xiaogang Wang and Luo Ping},
Model quantization is challenging due to many tedious hyper-parameters such as precision (bitwidth), dynamic range (minimum and maximum discrete values) and stepsize (interval between discrete values). Unlike prior arts that carefully tune these values, we present a fully differentiable approach to learn all of them, named Differentiable Dynamic Quantization (DDQ), which has several benefits. (1) DDQ is able to quantize challenging lightweight architectures like MobileNets, where different… 

Figures and Tables from this paper

SDQ: Stochastic Differentiable Quantization with Mixed Precision

A novel S tochastic D ifferentiable Q uantization (SDQ) method that can automatically learn the MPQ strategy in a more accessible and globally-optimized space with smoother gradient approximation and outperforms all state-of-the-art mixed or single precision quantization with a lower bitwidth.

AdaBin: Improving Binary Neural Networks with Adaptive Binary Sets

This paper presents a simple yet effective approach called AdaBin to adaptively obtain the optimal binary sets of weights and activations for each layer instead of a fixed set, which can better fit different distributions and increase the representation ability of binarized features.

FBM: Fast-Bit Allocation for Mixed-Precision Quantization

A comprehensive evaluation of the proposed Fast-Bit Allocation for Mixed-Precision Quantization (FBM) demonstrates the method’s superiority over current state-of-the-art schemes in terms of the trade-off between neural network accuracy and hardware efficiency.

moTuner: a compiler-based auto-tuning approach for mixed-precision operators

Arithmetic operators are now used in a wide spectrum of domains, including artificial intelligence, data analytics and scientific computing. Meanwhile, specialized hardware components to enable



Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

Differentiable Soft Quantization (DSQ) is proposed to bridge the gap between the full-precision and low-bit networks and can help pursue the accurate gradients in backward propagation, and reduce the quantization loss in forward process with an appropriate clipping range.

Mixed Precision DNNs: All you need is a good parametrization

This work proposes to parametrize the quantizer with the step size and dynamic range, so that the bitwidth can be inferred from them and obtains mixed precision DNNs with learned quantization parameters, achieving state-of-the-art performance.

Learning to Quantize Deep Networks by Optimizing Quantization Intervals With Task Loss

This work proposes a trainable quantizer that can be trained on a heterogeneous dataset, and thus can be used to quantize pretrained networks without access to their training data, and outperforms existing methods to achieve the state-of-the-art accuracy.

Learned Step Size Quantization

This work introduces a novel means to estimate and scale the task loss gradient at each weight and activation layer's quantizer step size, such that it can be learned in conjunction with other network parameters.

LSQ+: Improving low-bit quantization through learnable offsets and better initialization

LSQ+ is the first work to quantize such architectures to extremely low bit-widths and shows state-of-the-art results for EfficientNet and MixNet and also significantly outperforms LSQ for low-bit quantization of neural nets with Swish activations.

HMQ: Hardware Friendly Mixed Precision Quantization Block for CNNs

The Hardware Friendly Mixed Precision Quantization Block (HMQ) is a mixed precision quantization block that repurposes the Gumbel-Softmax estimator into a smooth estimator of a pair of quantization parameters, namely, bit-width and threshold.

HAQ: Hardware-Aware Automated Quantization With Mixed Precision

The Hardware-Aware Automated Quantization (HAQ) framework is introduced which leverages the reinforcement learning to automatically determine the quantization policy, and takes the hardware accelerator's feedback in the design loop to generate direct feedback signals to the RL agent.

Data-Free Quantization Through Weight Equalization and Bias Correction

We introduce a data-free quantization method for deep neural networks that does not require fine-tuning or hyperparameter selection. It achieves near-original model performance on common computer

PACT: Parameterized Clipping Activation for Quantized Neural Networks

It is shown, for the first time, that both weights and activations can be quantized to 4-bits of precision while still achieving accuracy comparable to full precision networks across a range of popular models and datasets.

Towards Efficient Training for Neural Network Quantization

Through SAT, quantized models obtain comparable or even better performance than their full-precision counterparts, achieving state-of-the-art accuracy with consistent improvement over previous quantization methods on a wide spectrum of models.