• Corpus ID: 211010717

Widening and Squeezing: Towards Accurate and Efficient QNNs

@article{Liu2020WideningAS,
  title={Widening and Squeezing: Towards Accurate and Efficient QNNs},
  author={Chuanjian Liu and Kai Han and Yunhe Wang and Hanting Chen and Chunjing Xu and Qi Tian},
  journal={ArXiv},
  year={2020},
  volume={abs/2002.00555}
}
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters. Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques. However, we find the representation capability of quantization features is far weaker than full-precision features by… 

References

SHOWING 1-10 OF 50 REFERENCES

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size

This work proposes a small DNN architecture called SqueezeNet, which achieves AlexNet-level accuracy on ImageNet with 50x fewer parameters and is able to compress to less than 0.5MB (510x smaller than AlexNet).

Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding

This work introduces "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy.

Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit?

  • Shilin ZhuXin DongHao Su
  • Computer Science
    2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2019
The Binary Ensemble Neural Network (BENN) is proposed, which leverages ensemble methods to improve the performance of BNNs with limited efficiency cost and can even surpass the accuracy of the full-precision floating number network with the same architecture.

BNN+: Improved Binary Network Training

An improved binary training method is proposed, by introducing a new regularization function that encourages training weights around binary values and introducing an improved approximation of the derivative of the $sign$ activation function in the backward computation.

Trained Ternary Quantization

This work proposes Trained Ternary Quantization (TTQ), a method that can reduce the precision of weights in neural networks to ternary values to improve the accuracy of some models (32, 44, 56-layer ResNet) on CIFAR-10 and AlexNet on ImageNet.

Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration

Unlike previous methods, FPGM compresses CNN models by pruning filters with redundancy, rather than those with“relatively less” importance, and when applied to two image classification benchmarks, the method validates its usefulness and strengths.

Learning Instance-wise Sparsity for Accelerating Deep Models

This work expects intermediate feature maps of each instance in deep neural networks to be sparse while preserving the overall network performance, and takes coefficient of variation as a measure to select the layers that are appropriate for acceleration.

Pruning Filters for Efficient ConvNets

This work presents an acceleration method for CNNs, where it is shown that even simple filter pruning techniques can reduce inference costs for VGG-16 and ResNet-110 by up to 38% on CIFAR10 while regaining close to the original accuracy by retraining the networks.

Learning Efficient Convolutional Networks through Network Slimming

The approach is called network slimming, which takes wide and large networks as input models, but during training insignificant channels are automatically identified and pruned afterwards, yielding thin and compact models with comparable accuracy.

Improving the speed of neural networks on CPUs

This paper uses speech recognition as an example task, and shows that a real-time hybrid hidden Markov model / neural network (HMM/NN) large vocabulary system can be built with a 10× speedup over an unoptimized baseline and a 4× speed up over an aggressively optimized floating-point baseline at no cost in accuracy.