Corpus ID: 233307280

Differentiable Model Compression via Pseudo Quantization Noise

  title={Differentiable Model Compression via Pseudo Quantization Noise},
  author={Alexandre D'efossez and Yossi Adi and Gabriel Synnaeve},
We propose to add independent pseudo quantization noise to model parameters during training to approximate the effect of a quantization operator. This method, DIFFQ, is differentiable both with respect to the unquantized parameters, and the number of bits used. Given a single hyper-parameter expressing the desired balance between the quantized model size and accuracy, DIFFQ can optimize the number of bits used per individual weight or groups of weights, in a single training. We experimentally… Expand
ResMLP: Feedforward networks for image classification with data-efficient training
ResMLP is a simple residual network that alternates a linear layer in which image patches interact, independently and identically across channels, and a two-layer feed-forward network in which channels interact independently per patch. Expand
Music Source Separation in the Waveform Domain
Demucs is proposed, a new waveform-to-waveform model, which has an architecture closer to models for audio generation with more capacity on the decoder, and human evaluations show that Demucs has significantly higher quality than Conv-Tasnet, but slightly more contamination from other sources, which explains the difference in SDR. Expand


And the Bit Goes Down: Revisiting the Quantization of Neural Networks
This paper introduces a vector quantization method that aims at preserving the quality of the reconstruction of the network outputs rather than its weights and minimizes the loss reconstruction error for in-domain inputs. Expand
LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
This work proposes to jointly train a quantized, bit-operation-compatible DNN and its associated quantizers, as opposed to using fixed, handcrafted quantization schemes such as uniform or logarithmic quantization, to address the gap in prediction accuracy between the quantized model and the full-precision model. Expand
Learning to Quantize Deep Networks by Optimizing Quantization Intervals With Task Loss
This work proposes a trainable quantizer that can be trained on a heterogeneous dataset, and thus can be used to quantize pretrained networks without access to their training data, and outperforms existing methods to achieve the state-of-the-art accuracy. Expand
Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks
Differentiable Soft Quantization (DSQ) is proposed to bridge the gap between the full-precision and low-bit networks and can help pursue the accurate gradients in backward propagation, and reduce the quantization loss in forward process with an appropriate clipping range. Expand
Quantizing deep convolutional networks for efficient inference: A whitepaper
An overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations is presented and it is recommended that per-channel quantization of weights and per-layer quantized of activations be the preferred quantization scheme for hardware acceleration and kernel optimization. Expand
Learned Step Size Quantization
This work introduces a novel means to estimate and scale the task loss gradient at each weight and activation layer's quantizer step size, such that it can be learned in conjunction with other network parameters. Expand
MetaQuant: Learning to Quantize by Learning to Penetrate Non-differentiable Quantization
A meta network is trained using g_q and r as inputs, and outputs $g_r$ for subsequent weight updates, which alleviates the problem of non-differentiability, and can be trained in an end-to-end manner. Expand
Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks
The proposed method of training quantization thresholds (TQT) for uniform symmetric quantizers using standard backpropagation and gradient descent is able to achieve near-floating-point accuracy on traditionally difficult networks such as MobileNets with less than 5 epochs of quantized (8-bit) retraining. Expand
Model compression via distillation and quantization
This paper proposes two new compression methods, which jointly leverage weight quantization and distillation of larger teacher networks into smaller student networks, and shows that quantized shallow students can reach similar accuracy levels to full-precision teacher models. Expand
HAQ: Hardware-Aware Automated Quantization With Mixed Precision
The Hardware-Aware Automated Quantization (HAQ) framework is introduced which leverages the reinforcement learning to automatically determine the quantization policy, and takes the hardware accelerator's feedback in the design loop to generate direct feedback signals to the RL agent. Expand