Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation
@article{Wu2020IntegerQF, title={Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation}, author={Hao Wu and P. Judd and Xiaojie Zhang and M. Isaev and P. Micikevicius}, journal={ArXiv}, year={2020}, volume={abs/2004.09602} }
Quantization techniques can reduce the size of Deep Neural Networks and improve inference latency and throughput by taking advantage of high throughput integer instructions. In this paper we review the mathematical aspects of quantization parameters and evaluate their choices on a wide range of neural network models for different application domains, including vision, speech, and language. We focus on quantization techniques that are amenable to acceleration by processors with high-throughput… CONTINUE READING
Figures, Tables, and Topics from this paper
13 Citations
GPQ: Greedy Partial Quantization of Convolutional Neural Networks Inspired by Submodular Optimization
- 2020 7th International Conference on Soft Computing & Machine Intelligence (ISCMI)
- 2020
VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference
- Computer Science
- ArXiv
- 2021
- Highly Influenced
- PDF
Exploring Neural Networks Quantization via Layer-Wise Quantization Analysis
- Computer Science, Mathematics
- ArXiv
- 2020
- PDF
Empirical Evaluation of Deep Learning Model Compression Techniques on the WaveNet Vocoder
- Computer Science, Engineering
- ArXiv
- 2020
- PDF
Differentiable Joint Pruning and Quantization for Hardware Efficiency
- Computer Science, Mathematics
- ECCV
- 2020
- 4
- PDF
Degree-Quant: Quantization-Aware Training for Graph Neural Networks
- Computer Science, Mathematics
- ArXiv
- 2020
- 1
- PDF
Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIX
- Computer Science
- ECCV
- 2020
- Highly Influenced
References
SHOWING 1-10 OF 64 REFERENCES
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
- Computer Science, Mathematics
- 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
- 671
- PDF
Quantizing deep convolutional networks for efficient inference: A whitepaper
- Computer Science, Mathematics
- ArXiv
- 2018
- 246
- Highly Influential
- PDF
PACT: Parameterized Clipping Activation for Quantized Neural Networks
- Computer Science
- ArXiv
- 2018
- 238
- Highly Influential
- PDF
Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks
- Computer Science
- MLSys
- 2020
- 28
- PDF
Data-Free Quantization Through Weight Equalization and Bias Correction
- Computer Science, Mathematics
- 2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019
- 83
- PDF
Memory-Driven Mixed Low Precision Quantization For Enabling Deep Network Inference On Microcontrollers
- Computer Science, Mathematics
- MLSys
- 2020
- 21
- PDF
Neural Network Compression Framework for fast model inference
- Computer Science, Engineering
- ArXiv
- 2020
- 8
- PDF
Retraining-Based Iterative Weight Quantization for Deep Neural Networks
- Computer Science, Mathematics
- ArXiv
- 2018
- 11
- PDF