Corpus ID: 216035831

Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation

@article{Wu2020IntegerQF,
  title={Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation},
  author={Hao Wu and P. Judd and Xiaojie Zhang and M. Isaev and P. Micikevicius},
  journal={ArXiv},
  year={2020},
  volume={abs/2004.09602}
}
  • Hao Wu, P. Judd, +2 authors P. Micikevicius
  • Published 2020
  • Computer Science, Mathematics
  • ArXiv
  • Quantization techniques can reduce the size of Deep Neural Networks and improve inference latency and throughput by taking advantage of high throughput integer instructions. In this paper we review the mathematical aspects of quantization parameters and evaluate their choices on a wide range of neural network models for different application domains, including vision, speech, and language. We focus on quantization techniques that are amenable to acceleration by processors with high-throughput… CONTINUE READING
    13 Citations
    GPQ: Greedy Partial Quantization of Convolutional Neural Networks Inspired by Submodular Optimization
    VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference
    • Highly Influenced
    • PDF
    Exploring Neural Networks Quantization via Layer-Wise Quantization Analysis
    • PDF
    Empirical Evaluation of Deep Learning Model Compression Techniques on the WaveNet Vocoder
    • PDF
    Differentiable Joint Pruning and Quantization for Hardware Efficiency
    • 4
    • PDF
    Degree-Quant: Quantization-Aware Training for Graph Neural Networks
    • 1
    • PDF
    Layer-Wise Data-Free CNN Compression
    • PDF
    Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIX
    • Highly Influenced

    References

    SHOWING 1-10 OF 64 REFERENCES
    Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
    • 671
    • PDF
    Quantizing deep convolutional networks for efficient inference: A whitepaper
    • 246
    • Highly Influential
    • PDF
    PACT: Parameterized Clipping Activation for Quantized Neural Networks
    • 238
    • Highly Influential
    • PDF
    Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks
    • 28
    • PDF
    Data-Free Quantization Through Weight Equalization and Bias Correction
    • 83
    • PDF
    Memory-Driven Mixed Low Precision Quantization For Enabling Deep Network Inference On Microcontrollers
    • 21
    • PDF
    Neural Network Compression Framework for fast model inference
    • 8
    • PDF
    Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM
    • 135
    • PDF
    Retraining-Based Iterative Weight Quantization for Deep Neural Networks
    • 11
    • PDF
    Model compression via distillation and quantization
    • 241
    • Highly Influential
    • PDF