Generalizable Mixed-Precision Quantization via Attribution Rank Preservation

@article{Wang2021GeneralizableMQ,
  title={Generalizable Mixed-Precision Quantization via Attribution Rank Preservation},
  author={Ziwei Wang and Han Xiao and Jiwen Lu and Jie Zhou},
  journal={2021 IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2021},
  pages={5271-5280}
}
  • Ziwei WangHan Xiao Jie Zhou
  • Published 5 August 2021
  • Computer Science
  • 2021 IEEE/CVF International Conference on Computer Vision (ICCV)
In this paper, we propose a generalizable mixed-precision quantization (GMPQ) method for efficient inference. Conventional methods require the consistency of datasets for bitwidth search and model deployment to guarantee the policy optimality, leading to heavy search cost on challenging largescale datasets in realistic applications. On the contrary, our GMPQ searches the mixed-quantization policy that can be generalized to largescale datasets with only a small amount of data, so that the search… 

Figures and Tables from this paper

Guided Hybrid Quantization for Object detection in Multimodal Remote Sensing Imagery via One-to-one Self-teaching

This work designs a structure called guided quantization self-distillation (GQSD), which is an innovative idea for realizing lightweight through the synergy of quantization and distillation and proposes a one-to-one self-teaching module to give the student network a ability of self-judgment.

Sharpness-aware Quantization for Deep Neural Networks

Extensive experiments on both convolutional neural networks and Transformers across various datasets show that SAQ improves the generalization performance of the quantized models, yielding the SOTA results in uniform quantization.

References

SHOWING 1-10 OF 61 REFERENCES

HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks

A theoretical analysis shows that a better sensitivity metric is to compute the average of all of the Hessian eigenvalues, and a Pareto frontier based method for selecting the exact bit precision of different layers without any manual selection is developed.

HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision

Hessian AWare Quantization (HAWQ), a novel second-order quantization method that allows for the automatic selection of the relative quantization precision of each layer, based on the layer's Hessian spectrum, is introduced.

Rethinking Differentiable Search for Mixed-Precision Neural Networks

A new differentiable search architecture is proposed, with several novel contributions to advance the efficiency by leveraging the unique properties of the MPS problem, and the resulting Efficient differentiable MIxed-Precision network Search (EdMIPS) method is effective at finding the optimal bit allocation for multiple popular networks.

Adaptive Loss-Aware Quantization for Multi-Bit Networks

Adaptive Loss-aware Quantization (ALQ), a new MBN quantization pipeline that is able to achieve an average bitwidth below one-bit without notable loss in inference accuracy, is proposed.

Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

Differentiable Soft Quantization (DSQ) is proposed to bridge the gap between the full-precision and low-bit networks and can help pursue the accurate gradients in backward propagation, and reduce the quantization loss in forward process with an appropriate clipping range.

APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

APQ is presented, a novel design methodology for efficient deep learning deployment that designs to optimize the neural network architecture, pruning policy, and quantization policy in a joint manner and uses predictor-transfer technique to get the quantization-aware accuracy predictor.

Search What You Want: Barrier Panelty NAS for Mixed Precision Quantization

A novel soft Barrier Penalty based NAS (BP-NAS) is proposed for mixed precision quantization, which ensures all the searched models are inside the valid domain defined by the complexity constraint, thus could return an optimal model under the given constraint by conducting search only one time.

Differentiable Joint Pruning and Quantization for Hardware Efficiency

DJPQ incorporates variational information bottleneck based structured pruning and mixed-bit precision quantization into a single differentiable loss function and significantly reduces the number of Bit-Operations for several networks while maintaining the top-1 accuracy of original floating-point models.

Additive Powers-of-Two Quantization: A Non-uniform Discretization for Neural Networks

Experimental results show that the proposed Additive Powers-of-Two~(APoT) quantization method outperforms state- of-the-art methods, and is even competitive with the full-precision models demonstrating the effectiveness of the proposed APoT quantization.

HMQ: Hardware Friendly Mixed Precision Quantization Block for CNNs

The Hardware Friendly Mixed Precision Quantization Block (HMQ) is a mixed precision quantization block that repurposes the Gumbel-Softmax estimator into a smooth estimator of a pair of quantization parameters, namely, bit-width and threshold.
...