# BinaryConnect: Training Deep Neural Networks with binary weights during propagations

@inproceedings{Courbariaux2015BinaryConnectTD, title={BinaryConnect: Training Deep Neural Networks with binary weights during propagations}, author={Matthieu Courbariaux and Yoshua Bengio and Jean-Pierre David}, booktitle={NIPS}, year={2015} }

Deep Neural Networks (DNN) have achieved state-of-the-art results in a wide range of tasks, with the best results obtained with large training sets and large models. [...] Key Method We introduce BinaryConnect, a method which consists in training a DNN with binary weights during the forward and backward propagations, while retaining precision of the stored weights in which gradients are accumulated. Like other dropout schemes, we show that BinaryConnect acts as regularizer and we obtain near state-of-the-art… Expand

#### 1,885 Citations

Training and Inference with Integers in Deep Neural Networks

- Computer Science, Mathematics
- ICLR
- 2018

Empirically, this work demonstrates the potential to deploy training in hardware systems such as integer-based deep learning accelerators and neuromorphic chips with comparable accuracy and higher energy efficiency, which is crucial to future AI applications in variable scenarios with transfer and continual learning demands. Expand

PXNOR: Perturbative Binary Neural Network

- Computer Science
- 2019 18th RoEduNet Conference: Networking in Education and Research (RoEduNet)
- 2019

PXNOR seeks to fully replace traditional convolutional filters with approximate operations, while replacing all multiplications and additions with simpler, much faster versions such as XNOR and bitcounting, which are implemented at hardware level on all existing platforms. Expand

Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

- Computer Science
- 2016

A binary matrix multiplication GPU kernel is written with which it is possible to run the MNIST BNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy. Expand

GXNOR-Net: Training deep neural networks with ternary weights and activations without full-precision memory under a unified discretization framework

- Computer Science, Medicine
- Neural Networks
- 2018

It is found that when both the weights and activations become ternary values, the DNNs can be reduced to sparse binary networks, termed as gated XNOR networks (GXNOR-Nets), which promises the event-driven hardware design for efficient mobile intelligence. Expand

BINARY DEEP NEURAL NETWORKS

- 2018

There are many applications scenarios for which the computational performance and memory footprint of the prediction phase of Deep Neural Networks (DNNs) need to be optimized. Binary Deep Neural… Expand

Training High-Performance and Large-Scale Deep Neural Networks with Full 8-bit Integers

- Computer Science, Mathematics
- Neural Networks
- 2020

A unified complete quantization framework termed as WAGEUBN to quantize DNNs involving all data paths including W (Weights), A (Activation), G (Gradient), E (Error), U (Update), and BN (Update) and the Momentum optimizer is also quantized to realize a completely quantized framework. Expand

Espresso: Efficient Forward Propagation for Binary Deep Neural Networks

- Computer Science
- ICLR
- 2018

Espresso provides special convolutional and dense layers for BCNNs, leveraging bit-packing and bitwise computations for efficient execution, and provides a speed-up of matrix-multiplication routines, and at the same time, reduce memory usage when storing parameters and activations. Expand

BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

- Computer Science
- ArXiv
- 2016

BinaryNet, a method which trains DNNs with binary weights and activations when computing parameters’ gradient is introduced, which drastically reduces memory usage and replaces most multiplications by 1-bit exclusive-not-or (XNOR) operations, which might have a big impact on both general-purpose and dedicated Deep Learning hardware. Expand

Simultaneously Optimizing Weight and Quantizer of Ternary Neural Network Using Truncated Gaussian Approximation

- Computer Science, Mathematics
- 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019

This work is the first to incorporate the thresholds of weight ternarization into a closed-form representation using truncated Gaussian approximation, enabling simultaneous optimization of weights and quantizer through back-propagation training. Expand

Sparsely-Connected Neural Networks: Towards Efficient VLSI Implementation of Deep Neural Networks

- Computer Science
- ICLR
- 2017

Sparsely-connected neural networks are proposed, by showing that the number of connections in fully-connected networks can be reduced by up to 90% while improving the accuracy performance on three popular datasets while proposing an efficient hardware architecture based on linear-feedback shift registers to reduce the memory requirements of the proposed sparsely- connected networks. Expand

#### References

SHOWING 1-10 OF 53 REFERENCES

Training deep neural networks with low precision multiplications

- Computer Science
- 2014

It is found that very low precision is sufficient not just for running trained networks but also for training them, and it is possible to train Maxout networks with 10 bits multiplications. Expand

Low precision arithmetic for deep learning

- Computer Science
- ICLR
- 2015

It is found that very low precision computation is sufficient not just for running trained networks but also for training them. Expand

ImageNet classification with deep convolutional neural networks

- Computer Science
- Commun. ACM
- 2012

A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective. Expand

DaDianNao: A Machine-Learning Supercomputer

- Computer Science
- 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture
- 2014

This article introduces a custom multi-chip machine-learning architecture, showing that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip system. Expand

DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning

- Computer Science
- ASPLOS 2014
- 2014

This study designs an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance and energy, and shows that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s in a small footprint. Expand

Fixed-point feedforward deep neural network design using weights +1, 0, and −1

- Computer Science
- 2014 IEEE Workshop on Signal Processing Systems (SiPS)
- 2014

The designed fixed-point networks with ternary weights (+1, 0, and -1) and 3-bit signal show only negligible performance loss when compared to the floating-point coun-terparts. Expand

Deep Learning with Limited Numerical Precision

- Computer Science, Mathematics
- ICML
- 2015

The results show that deep networks can be trained using only 16-bit wide fixed-point number representation when using stochastic rounding, and incur little to no degradation in the classification accuracy. Expand

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

- Computer Science
- ICML
- 2015

Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Expand

X1000 real-time phoneme recognition VLSI using feed-forward deep neural networks

- Computer Science
- 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2014

This work develops a digital VLSI for phoneme recognition using deep neural networks and assess the design in terms of throughput, chip size, and power consumption. Expand

A highly scalable Restricted Boltzmann Machine FPGA implementation

- Computer Science
- 2009 International Conference on Field Programmable Logic and Applications
- 2009

This paper describes a novel architecture and FPGA implementation that accelerates the training of general RBMs in a scalable manner, with the goal of producing a system that machine learning researchers can use to investigate ever-larger networks. Expand