HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision

@article{Dong2019HAWQHA,
  title={HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision},
  author={Zhen Dong and Zhewei Yao and Amir Gholami and Michael W. Mahoney and Kurt Keutzer},
  journal={2019 IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2019},
  pages={293-302}
}
  • Zhen Dong, Z. Yao, +2 authors K. Keutzer
  • Published 2019
  • Computer Science
  • 2019 IEEE/CVF International Conference on Computer Vision (ICCV)
Model size and inference speed/power have become a major challenge in the deployment of Neural Networks for many applications. [...] Key Method HAWQ allows for the automatic selection of the relative quantization precision of each layer, based on the layer's Hessian spectrum. Moreover, HAWQ provides a deterministic fine-tuning order for quantizing layers, based on second-order information. We show the results of our method on Cifar-10 using ResNet20, and on ImageNet using Inception-V3, ResNet50 and SqueezeNext…Expand
HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks
TLDR
A theoretical analysis shows that a better sensitivity metric is to compute the average of all of the Hessian eigenvalues, and a Pareto frontier based method for selecting the exact bit precision of different layers without any manual selection is developed. Expand
Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization
  • Weihan Chen, Peisong Wang, Jian Cheng
  • Computer Science
  • 2021
TLDR
This paper forms the mixed-precision quantization as a discrete constrained optimization problem, approximate the objective function with second-order Taylor expansion and propose an efficient approach to compute its Hessian matrix, and shows that the original problem can be reformulated as a MultipleChoice Knapsack Problem (MCKP) and proposed a greedy search algorithm to solve it efficiently. Expand
BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization
Mixed-precision quantization can potentially achieve the optimal tradeoff between performance and compression rate of deep neural networks, and thus, have been widely investigated. However, it lacksExpand
Layer Importance Estimation with Imprinting for Neural Network Quantization
TLDR
This work proposes an accuracy-aware criterion to quantify the layer’s importance rank and applies imprinting per layer which acts as a proxy module for accuracy estimation in an efficient way to draw better interpretability to the selected bit-width configuration. Expand
A White Paper on Neural Network Quantization
TLDR
This paper introduces state-of-the-art algorithms for mitigating the impact of quantization noise on the network’s performance while maintaining low-bit weights and activations and considers two main classes of algorithms: Post-Training Quantization and Quantization-Aware-Training. Expand
ZeroQ: A Novel Zero Shot Quantization Framework
TLDR
THE AUTHORS' enables mixed-precision quantization without any access to the training or validation data, and it can finish the entire quantization process in less than 30s, which is very low computational overhead. Expand
Channel-wise Hessian Aware trace-Weighted Quantization of Neural Networks
TLDR
Channel-wise Hessian Aware trace-Weighted Quantization (CW-HAWQ) uses Hessian trace to determine the relative sensitivity order of different channels of activations and weights and proposes to use deep Reinforcement learning (DRL) Deep Deterministic Policy Gradient (DDPG)-based agent to find the optimal ratios of different quantization bits and assign bits to channels according to the Hessian Trace order. Expand
Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss
  • Jung Hyun Lee, Jihun Yun, Sung Ju Hwang, Eunho Yang
  • Computer Science
  • ArXiv
  • 2021
TLDR
This work proposes a novel quantization method for neural networks, Cluster-Promoting Quantization (CPQ), and introduces a new bit-drop technique, DropBits, that revises the standard dropout regularization to randomly drop bits instead of neurons. Expand
UNIFORM-PRECISION NEURAL NETWORK QUANTIZA-
  • 2020
Uniform-precision neural network quantization has gained popularity thanks to its simple arithmetic unit densely packed for high computing capability. However, it ignores heterogeneous sensitivity toExpand
Exploring Neural Networks Quantization via Layer-Wise Quantization Analysis
TLDR
A simple analytic framework that breaks down overall degradation to its per layer contributions and allows for a more nuanced examination of how quantization affects the network, enabling the design of better performing schemes. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 52 REFERENCES
PACT: Parameterized Clipping Activation for Quantized Neural Networks
TLDR
It is shown, for the first time, that both weights and activations can be quantized to 4-bits of precision while still achieving accuracy comparable to full precision networks across a range of popular models and datasets. Expand
Value-aware Quantization for Training and Inference of Neural Networks
We propose a novel value-aware quantization which applies aggressively reduced precision to the majority of data while separately handling a small amount of large data in high precision, whichExpand
Mixed Precision Quantization of ConvNets via Differentiable Neural Architecture Search
TLDR
A novel differentiable neural architecture search (DNAS) framework is proposed to efficiently explore its exponential search space with gradient-based optimization and surpass the state-of-the-art compression of ResNet on CIFAR-10 and ImageNet. Expand
LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
TLDR
This work proposes to jointly train a quantized, bit-operation-compatible DNN and its associated quantizers, as opposed to using fixed, handcrafted quantization schemes such as uniform or logarithmic quantization, to address the gap in prediction accuracy between the quantized model and the full-precision model. Expand
Quantizing deep convolutional networks for efficient inference: A whitepaper
TLDR
An overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations is presented and it is recommended that per-channel quantization of weights and per-layer quantized of activations be the preferred quantization scheme for hardware acceleration and kernel optimization. Expand
Adaptive Quantization for Deep Neural Network
TLDR
This is the first work that theoretically analyse the relationship between parameter quantization errors of individual layers and model accuracy, and achieves 20-40% higher compression rate compared to equal bit-width quantization at the same model prediction accuracy. Expand
HAQ: Hardware-Aware Automated Quantization
TLDR
This paper introduces the Hardware-Aware Automated Quantization (HAQ) framework, which leverages the reinforcement learning to automatically determine the quantization policy, and takes the hardware accelerator's feedback in the design loop to reduce the latency and energy consumption. Expand
Learning Accurate Low-Bit Deep Neural Networks with Stochastic Quantization
TLDR
Stochastic quantization (SQ) algorithm for learning accurate low-bit DNNs quantizes a portion of elements/filters to low- bit with a stochastic probability inversely proportional to the quantization error, while keeping the other portion unchanged with full-precision. Expand
Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights
TLDR
Extensive experiments on the ImageNet classification task using almost all known deep CNN architectures including AlexNet, VGG-16, GoogleNet and ResNets well testify the efficacy of the proposed INQ, showing that at 5-bit quantization, models have improved accuracy than the 32-bit floating-point references. Expand
Trained Ternary Quantization
TLDR
This work proposes Trained Ternary Quantization (TTQ), a method that can reduce the precision of weights in neural networks to ternary values to improve the accuracy of some models (32, 44, 56-layer ResNet) on CIFAR-10 and AlexNet on ImageNet. Expand
...
1
2
3
4
5
...