Training deep neural networks with low precision multiplications
@article{Courbariaux2014TrainingDN, title={Training deep neural networks with low precision multiplications}, author={Matthieu Courbariaux and Yoshua Bengio and Jean-Pierre David}, journal={arXiv: Learning}, year={2014} }
Multipliers are the most space and power-hungry arithmetic operators of the digital implementation of deep neural networks. [] Key Result For example, it is possible to train Maxout networks with 10 bits multiplications.
Figures and Tables from this paper
464 Citations
Deep Neural Network Training without Multiplications
- Computer ScienceArXiv
- 2020
It is shown that ResNet can be trained using an integer-add instruction in place of a floating-point multiplication instruction with competitive classification accuracy and will enable eliminating the multiplications in deep neural-network training and inference.
Low-Precision Floating-Point Schemes for Neural Network Training
- Computer ScienceArXiv
- 2018
A simplified model in which both the outputs and the gradients of the neural networks are constrained to power-of-two values, just using 7 bits for their representation is introduced, significantly reducing the training time as well as the energy consumption and memory requirements during the training and inference phases.
Hardware-software codesign of accurate, multiplier-free Deep Neural Networks
- Computer Science2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC)
- 2017
This work proposes a novel approach to map floating-point based DNNs to 8-bit dynamic fixed-point networks with integer power-of-two weights with no change in network architecture and proposes a hardware accelerator design to achieve low-power, low-latency inference with insignificant degradation in accuracy.
BinaryConnect: Training Deep Neural Networks with binary weights during propagations
- Computer ScienceNIPS
- 2015
BinaryConnect is introduced, a method which consists in training a DNN with binary weights during the forward and backward propagations, while retaining precision of the stored weights in which gradients are accumulated, and near state-of-the-art results with BinaryConnect are obtained on the permutation-invariant MNIST, CIFAR-10 and SVHN.
Deep Learning with Limited Numerical Precision
- Computer ScienceICML
- 2015
The results show that deep networks can be trained using only 16-bit wide fixed-point number representation when using stochastic rounding, and incur little to no degradation in the classification accuracy.
Low-Precision Batch-Normalized Activations
- Computer ScienceArXiv
- 2017
This work introduces a quantization scheme that is compatible with training very deep neural networks, and shows how quantizing the network activations in the middle of each batch-normalization module can greatly reduce the amount of memory and computational power needed.
Handwritten Digit Classification using 8-bit Floating Point based Convolutional Neural Networks
- Computer Science
- 2018
This paper presents an approach of using reduced precision (8-bit) floating points for training hand-written characters classifier LeNeT-5 which allows for achieving 97.10% accuracy while reducing the overall space complexity by 75% in comparison to a model using single precision floating points.
Minimizing Power for Neural Network Training with Logarithm-Approximate Floating-Point Multiplier
- Computer Science2019 29th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS)
- 2019
This paper proposes to adopt logarithm-approximate multiplier (LAM) for multiply-accumulate (MAC) computation in neural network (NN) training engine, where LAM approximates a floating-point…
Quantization of Constrained Processor Data Paths Applied to Convolutional Neural Networks
- Computer Science2018 21st Euromicro Conference on Digital System Design (DSD)
- 2018
A layer-wise quantization heuristic to find a good fixed-point network approximation for platforms without wide accumulation registers is proposed and it is demonstrated that 16-bit accumulators are able to obtain a Top-1 classification accuracy within 1% of the floating-point baselines on the CIFAR-10 and ILSVRC2012 image classification benchmarks.
Quantization of constrained processor data paths applied to convolutional neural networks
- Computer Science
- 2018
A layer-wise quantization heuristic to find a good fixed-point network approximation for platforms without wide accumulation registers is proposed and it is demonstrated that 16-bit accumulators are able to obtain a Top-1 classification accuracy within 1% of the floating-point baselines on the CIFAR10 and ILSVRC2012 image classification benchmarks.
References
SHOWING 1-10 OF 36 REFERENCES
Deep Learning with Limited Numerical Precision
- Computer ScienceICML
- 2015
The results show that deep networks can be trained using only 16-bit wide fixed-point number representation when using stochastic rounding, and incur little to no degradation in the classification accuracy.
Improving the speed of neural networks on CPUs
- Computer Science
- 2011
This paper uses speech recognition as an example task, and shows that a real-time hybrid hidden Markov model / neural network (HMM/NN) large vocabulary system can be built with a 10× speedup over an unoptimized baseline and a 4× speed up over an aggressively optimized floating-point baseline at no cost in accuracy.
The Impact of Arithmetic Representation on Implementing MLP-BP on FPGAs: A Study
- Computer ScienceIEEE Transactions on Neural Networks
- 2007
The results show that an MLP-BP network uses less clock cycles and consumes less real estate when compiled in an FXP format, compared with a larger and slower functioning compilation in an FLP format with similar data representation width, in bits, or a similar precision and range.
Backpropagation without Multiplication
- Computer ScienceNIPS
- 1993
The back propagation algorithm has been modified to work without any multiplications and to tolerate computations with a low resolution, which makes it. more attractive for a hardware implementation.…
ImageNet classification with deep convolutional neural networks
- Computer ScienceCommun. ACM
- 2012
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
DaDianNao: A Machine-Learning Supercomputer
- Computer Science2014 47th Annual IEEE/ACM International Symposium on Microarchitecture
- 2014
This article introduces a custom multi-chip machine-learning architecture, showing that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip system.
A highly scalable Restricted Boltzmann Machine FPGA implementation
- Computer Science2009 International Conference on Field Programmable Logic and Applications
- 2009
This paper describes a novel architecture and FPGA implementation that accelerates the training of general RBMs in a scalable manner, with the goal of producing a system that machine learning researchers can use to investigate ever-larger networks.
DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning
- Computer ScienceASPLOS 2014
- 2014
This study designs an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance and energy, and shows that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s in a small footprint.
A fixed point implementation of the backpropagation learning algorithm
- Computer Science, MathematicsProceedings of SOUTHEASTCON '94
- 1994
The convergence results for a test example using fixed point, floating point and hardware implementations of the backpropagation algorithm are presented.
Stochastic Pooling for Regularization of Deep Convolutional Neural Networks
- Computer ScienceICLR
- 2013
We introduce a simple and effective method for regularizing large convolutional neural networks. We replace the conventional deterministic pooling operations with a stochastic procedure, randomly…