Towards Efficient Full 8-bit Integer DNN Online Training on Resource-limited Devices without Batch Normalization
@article{Yang2021TowardsEF, title={Towards Efficient Full 8-bit Integer DNN Online Training on Resource-limited Devices without Batch Normalization}, author={Yukuan Yang and Xiaowei Chi and Lei Deng and Tianyi Yan and Feng Gao and Guoqi Li}, journal={ArXiv}, year={2021}, volume={abs/2105.13890} }
Huge computational costs brought by convolution and batch normalization (BN) have caused great challenges for the online training and corresponding applications of deep neural networks (DNNs), especially in resource-limited devices. Existing works only focus on the convolution or BN acceleration and no solution can alleviate both problems with satisfactory performance. Online training has gradually become a trend in resource-limited devices like mobile phones while there is still no complete…
References
SHOWING 1-10 OF 31 REFERENCES
Training High-Performance and Large-Scale Deep Neural Networks with Full 8-bit Integers
- Computer ScienceNeural Networks
- 2020
L 1-Norm Batch Normalization for Efficient Training of Deep Neural Networks
- Computer Science
- 2018
This paper proposes an L1-norm BN (L1BN) with only linear operations in both forward and backward propagations during training, which surpasses L2BN in speed but also simplifies the design of deep learning accelerators.
Low-bit Quantization of Neural Networks for Efficient Inference
- Computer Science2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)
- 2019
This paper formalizes the linear quantization task as a Minimum Mean Squared Error (MMSE) problem for both weights and activations, allowing low-bit precision inference without the need for full network retraining.
GXNOR-Net: Training deep neural networks with ternary weights and activations without full-precision memory under a unified discretization framework
- Computer ScienceNeural Networks
- 2018
Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey
- Computer ScienceProceedings of the IEEE
- 2020
This article reviews the mainstream compression approaches such as compact model, tensor decomposition, data quantization, and network sparsification, and answers the question of how to leverage these methods in the design of neural network accelerators and present the state-of-the-art hardware architectures.
Training Deep Neural Networks with 8-bit Floating Point Numbers
- Computer ScienceNeurIPS
- 2018
This work demonstrates, for the first time, the successful training of deep neural networks using 8-bit floating point numbers while fully maintaining the accuracy on a spectrum of deep learning models and datasets.
Training and Inference with Integers in Deep Neural Networks
- Computer ScienceICLR
- 2018
Empirically, this work demonstrates the potential to deploy training in hardware systems such as integer-based deep learning accelerators and neuromorphic chips with comparable accuracy and higher energy efficiency, which is crucial to future AI applications in variable scenarios with transfer and continual learning demands.
Quantization Friendly MobileNet (QF-MobileNet) Architecture for Vision Based Applications on Embedded Platforms
- Computer ScienceNeural Networks
- 2021
Restructuring Batch Normalization to Accelerate CNN Training
- Computer ScienceMLSys
- 2019
The proposed BN restructuring can improve the performance of DenseNet with 121 convolutional layers by 28.4% and can significantly reduce main-memory accesses while training the latest CNN models, and the experiments on a chip multiprocessor with the modified Caffe implementation show that the proposed Bn restructuring can be improved.
DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
- Computer ScienceArXiv
- 2016
DoReFa-Net, a method to train convolutional neural networks that have low bitwidth weights and activations using low bit width parameter gradients, is proposed and can achieve comparable prediction accuracy as 32-bit counterparts.