• Corpus ID: 247154903

Standard Deviation-Based Quantization for Deep Neural Networks

@article{Ardakani2022StandardDQ,
  title={Standard Deviation-Based Quantization for Deep Neural Networks},
  author={Amir Ardakani and Arash Ardakani and Brett H. Meyer and James J. Clark and Warren J. Gross},
  journal={ArXiv},
  year={2022},
  volume={abs/2202.12422}
}
Quantization of deep neural networks is a promising approach that reduces the inference cost, making it feasible to run deep networks on resourcerestricted devices. Inspired by existing methods, we propose a new framework to learn the quantization intervals (discrete values) using the knowledge of the network’s weight and activation distributions, i.e., standard deviation. Furthermore, we propose a novel base-2 logarithmic quantization scheme to quantize weights to power-of-two discrete values… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 35 REFERENCES

Deep Residual Learning for Image Recognition

TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

ImageNet classification with deep convolutional neural networks

TLDR
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

S3: Sign-Sparse-Shift Reparametrization for Effective Training of Low-bit Shift Networks

TLDR
This work proposes S 3 re-parameterization, a novel technique for training low-bit shift networks that decomposes a discrete parameter in a sign-sparse-shift 3-fold manner and shows 3- bit shift networks compete with their full-precision counterparts in terms of top-1 accuracy on ImageNet.

LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks

TLDR
This work proposes to jointly train a quantized, bit-operation-compatible DNN and its associated quantizers, as opposed to using fixed, handcrafted quantization schemes such as uniform or logarithmic quantization, to address the gap in prediction accuracy between the quantized model and the full-precision model.

PACT: Parameterized Clipping Activation for Quantized Neural Networks

TLDR
It is shown, for the first time, that both weights and activations can be quantized to 4-bits of precision while still achieving accuracy comparable to full precision networks across a range of popular models and datasets.

Linear Symmetric Quantization of Neural Networks for Low-precision Integer Hardware

TLDR
A learned linear symmetric quantizer for integer neural network processors is proposed, which not only quantizes neural parameters and activations to low-bit integer but also accelerates hardware inference by using batch normalization fusion and low-precision accumulators and multipliers.

Towards Effective Low-Bitwidth Convolutional Neural Networks

TLDR
This paper tackles the problem of training a deep convolutional neural network with both low-precision weights and low-bitwidth activations by proposing a two-stage optimization strategy to progressively find good local minima and adopting a novel learning scheme to jointly train a full- Precision model alongside the low-Precision one.

Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights

TLDR
Extensive experiments on the ImageNet classification task using almost all known deep CNN architectures including AlexNet, VGG-16, GoogleNet and ResNets well testify the efficacy of the proposed INQ, showing that at 5-bit quantization, models have improved accuracy than the 32-bit floating-point references.

Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding

TLDR
This work introduces "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy.

ShiftAddNet: A Hardware-Inspired Deep Network

TLDR
This paper presented ShiftAddNet, whose main inspiration is drawn from a common practice in energy-efficient hardware implementation, that is, multiplication can be instead performed with additions and logical bit-shifts, yielding a new type of deep network that involves only bit-shift and additive weight layers.