HAQ: Hardware-Aware Automated Quantization With Mixed Precision

@article{Wang2018HAQHA,
  title={HAQ: Hardware-Aware Automated Quantization With Mixed Precision},
  author={Kuan Wang and Zhijian Liu and Yujun Lin and Ji Lin and Song Han},
  journal={2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2018},
  pages={8604-8612}
}
  • Kuan Wang, Zhijian Liu, +2 authors Song Han
  • Published in
    IEEE/CVF Conference on…
    2018
  • Computer Science
  • Highlight Information
    Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. [...] Key Method Rather than relying on proxy signals such as FLOPs and model size, we employ a hardware simulator to generate direct feedback signals (latency and energy) to the RL agent. Compared with conventional methods, our framework is fully automated and can specialize the quantization policy for different neural network architectures and hardware architectures.Expand Abstract

    Create an AI-powered research feed to stay up to date with new papers like this posted to ArXiv

    Citations

    Publications citing this paper.
    SHOWING 1-10 OF 50 CITATIONS, ESTIMATED 80% COVERAGE

    Compressing Deep Neural Networks With Learnable Regularization

    VIEW 5 EXCERPTS
    CITES METHODS & BACKGROUND
    HIGHLY INFLUENCED

    Post-training Quantization with Multiple Points: Mixed Precision without Mixed Precision

    VIEW 6 EXCERPTS
    CITES BACKGROUND & METHODS
    HIGHLY INFLUENCED

    AUTOQ: AUTOMATED KERNEL-WISE NEURAL NET-

    • 2019
    VIEW 12 EXCERPTS
    CITES METHODS & BACKGROUND
    HIGHLY INFLUENCED

    Design Automation for Efficient Deep Learning Computing

    VIEW 9 EXCERPTS
    CITES METHODS

    Mixed Precision Neural Architecture Search for Energy Efficient Deep Learning

    VIEW 7 EXCERPTS
    CITES METHODS & BACKGROUND
    HIGHLY INFLUENCED

    Efficient Bitwidth Search for Practical Mixed Precision Neural Network

    VIEW 5 EXCERPTS
    CITES BACKGROUND & METHODS
    HIGHLY INFLUENCED

    Prune or quantize? Strategy for Pareto-optimally low-cost and accurate CNN

    VIEW 5 EXCERPTS
    CITES METHODS & BACKGROUND
    HIGHLY INFLUENCED

    QKD: Quantization-aware Knowledge Distillation

    VIEW 4 EXCERPTS
    CITES BACKGROUND & METHODS
    HIGHLY INFLUENCED

    Rethinking Neural Network Quantization

    VIEW 5 EXCERPTS
    CITES METHODS & BACKGROUND
    HIGHLY INFLUENCED

    FILTER CITATIONS BY YEAR

    2019
    2020

    CITATION STATISTICS

    • 10 Highly Influenced Citations

    • Averaged 14 Citations per year from 2018 through 2019

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 32 REFERENCES

    Efficient methods and hardware for deep learning

    VIEW 5 EXCERPTS
    HIGHLY INFLUENTIAL

    BISMO: A Scalable Bit-Serial Matrix Multiplication Overlay for Reconfigurable Computing

    VIEW 4 EXCERPTS
    HIGHLY INFLUENTIAL

    MobileNetV2: Inverted Residuals and Linear Bottlenecks

    VIEW 8 EXCERPTS
    HIGHLY INFLUENTIAL

    Adam: A Method for Stochastic Optimization

    VIEW 2 EXCERPTS
    HIGHLY INFLUENTIAL

    ImageNet: A large-scale hierarchical image database

    VIEW 3 EXCERPTS
    HIGHLY INFLUENTIAL