# Differentiable Model Compression via Pseudo Quantization Noise

@article{Defossez2021DifferentiableMC, title={Differentiable Model Compression via Pseudo Quantization Noise}, author={Alexandre D'efossez and Yossi Adi and Gabriel Synnaeve}, journal={ArXiv}, year={2021}, volume={abs/2104.09987} }

We propose to add independent pseudo quantization noise to model parameters during training to approximate the effect of a quantization operator. This method, DIFFQ, is differentiable both with respect to the unquantized parameters, and the number of bits used. Given a single hyper-parameter expressing the desired balance between the quantized model size and accuracy, DIFFQ can optimize the number of bits used per individual weight or groups of weights, in a single training. We experimentally… Expand

#### Figures and Tables from this paper

#### 2 Citations

ResMLP: Feedforward networks for image classification with data-efficient training

- Computer Science
- ArXiv
- 2021

ResMLP is a simple residual network that alternates a linear layer in which image patches interact, independently and identically across channels, and a two-layer feed-forward network in which channels interact independently per patch. Expand

Music Source Separation in the Waveform Domain

- Computer Science, Engineering
- ArXiv
- 2019

Demucs is proposed, a new waveform-to-waveform model, which has an architecture closer to models for audio generation with more capacity on the decoder, and human evaluations show that Demucs has significantly higher quality than Conv-Tasnet, but slightly more contamination from other sources, which explains the difference in SDR. Expand

#### References

SHOWING 1-10 OF 58 REFERENCES

And the Bit Goes Down: Revisiting the Quantization of Neural Networks

- Computer Science
- ICLR
- 2020

This paper introduces a vector quantization method that aims at preserving the quality of the reconstruction of the network outputs rather than its weights and minimizes the loss reconstruction error for in-domain inputs. Expand

LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks

- Computer Science
- ECCV
- 2018

This work proposes to jointly train a quantized, bit-operation-compatible DNN and its associated quantizers, as opposed to using fixed, handcrafted quantization schemes such as uniform or logarithmic quantization, to address the gap in prediction accuracy between the quantized model and the full-precision model. Expand

Learning to Quantize Deep Networks by Optimizing Quantization Intervals With Task Loss

- Computer Science
- 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019

This work proposes a trainable quantizer that can be trained on a heterogeneous dataset, and thus can be used to quantize pretrained networks without access to their training data, and outperforms existing methods to achieve the state-of-the-art accuracy. Expand

Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

- Computer Science, Engineering
- 2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019

Differentiable Soft Quantization (DSQ) is proposed to bridge the gap between the full-precision and low-bit networks and can help pursue the accurate gradients in backward propagation, and reduce the quantization loss in forward process with an appropriate clipping range. Expand

Quantizing deep convolutional networks for efficient inference: A whitepaper

- Computer Science, Mathematics
- ArXiv
- 2018

An overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations is presented and it is recommended that per-channel quantization of weights and per-layer quantized of activations be the preferred quantization scheme for hardware acceleration and kernel optimization. Expand

Learned Step Size Quantization

- Computer Science, Mathematics
- ICLR
- 2020

This work introduces a novel means to estimate and scale the task loss gradient at each weight and activation layer's quantizer step size, such that it can be learned in conjunction with other network parameters. Expand

MetaQuant: Learning to Quantize by Learning to Penetrate Non-differentiable Quantization

- Computer Science
- NeurIPS
- 2019

A meta network is trained using g_q and r as inputs, and outputs $g_r$ for subsequent weight updates, which alleviates the problem of non-differentiability, and can be trained in an end-to-end manner. Expand

Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks

- Computer Science
- MLSys
- 2020

The proposed method of training quantization thresholds (TQT) for uniform symmetric quantizers using standard backpropagation and gradient descent is able to achieve near-floating-point accuracy on traditionally difficult networks such as MobileNets with less than 5 epochs of quantized (8-bit) retraining. Expand

Model compression via distillation and quantization

- Computer Science
- ICLR
- 2018

This paper proposes two new compression methods, which jointly leverage weight quantization and distillation of larger teacher networks into smaller student networks, and shows that quantized shallow students can reach similar accuracy levels to full-precision teacher models. Expand

HAQ: Hardware-Aware Automated Quantization With Mixed Precision

- Computer Science
- 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019

The Hardware-Aware Automated Quantization (HAQ) framework is introduced which leverages the reinforcement learning to automatically determine the quantization policy, and takes the hardware accelerator's feedback in the design loop to generate direct feedback signals to the RL agent. Expand