• Corpus ID: 235247897

Towards Efficient Full 8-bit Integer DNN Online Training on Resource-limited Devices without Batch Normalization

  title={Towards Efficient Full 8-bit Integer DNN Online Training on Resource-limited Devices without Batch Normalization},
  author={Yukuan Yang and Xiaowei Chi and Lei Deng and Tianyi Yan and Feng Gao and Guoqi Li},
Huge computational costs brought by convolution and batch normalization (BN) have caused great challenges for the online training and corresponding applications of deep neural networks (DNNs), especially in resource-limited devices. Existing works only focus on the convolution or BN acceleration and no solution can alleviate both problems with satisfactory performance. Online training has gradually become a trend in resource-limited devices like mobile phones while there is still no complete… 

Figures and Tables from this paper


L 1-Norm Batch Normalization for Efficient Training of Deep Neural Networks
This paper proposes an L1-norm BN (L1BN) with only linear operations in both forward and backward propagations during training, which surpasses L2BN in speed but also simplifies the design of deep learning accelerators.
Low-bit Quantization of Neural Networks for Efficient Inference
This paper formalizes the linear quantization task as a Minimum Mean Squared Error (MMSE) problem for both weights and activations, allowing low-bit precision inference without the need for full network retraining.
Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey
This article reviews the mainstream compression approaches such as compact model, tensor decomposition, data quantization, and network sparsification, and answers the question of how to leverage these methods in the design of neural network accelerators and present the state-of-the-art hardware architectures.
Training Deep Neural Networks with 8-bit Floating Point Numbers
This work demonstrates, for the first time, the successful training of deep neural networks using 8-bit floating point numbers while fully maintaining the accuracy on a spectrum of deep learning models and datasets.
Training and Inference with Integers in Deep Neural Networks
Empirically, this work demonstrates the potential to deploy training in hardware systems such as integer-based deep learning accelerators and neuromorphic chips with comparable accuracy and higher energy efficiency, which is crucial to future AI applications in variable scenarios with transfer and continual learning demands.
Restructuring Batch Normalization to Accelerate CNN Training
The proposed BN restructuring can improve the performance of DenseNet with 121 convolutional layers by 28.4% and can significantly reduce main-memory accesses while training the latest CNN models, and the experiments on a chip multiprocessor with the modified Caffe implementation show that the proposed Bn restructuring can be improved.
DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
DoReFa-Net, a method to train convolutional neural networks that have low bitwidth weights and activations using low bit width parameter gradients, is proposed and can achieve comparable prediction accuracy as 32-bit counterparts.