Espresso: Efficient Forward Propagation for BCNNs
@article{Pedersoli2017EspressoEF, title={Espresso: Efficient Forward Propagation for BCNNs}, author={Fabrizio Pedersoli and George Tzanetakis and Andrea Tagliasacchi}, journal={ArXiv}, year={2017}, volume={abs/1705.07175} }
There are many applications scenarios for which the computational performance and memory footprint of the prediction phase of Deep Neural Networks (DNNs) needs to be optimized. Binary Neural Networks (BDNNs) have been shown to be an effective way of achieving this objective. In this paper, we show how Convolutional Neural Networks (CNNs) can be implemented using binary representations. Espresso is a compact, yet powerful library written in C/CUDA that features all the functionalities required…
11 Citations
Binarized Convolutional Neural Networks for Efficient Inference on GPUs
- Computer Science2018 26th European Signal Processing Conference (EUSIPCO)
- 2018
Convolutional neural networks have recently achieved significant breakthroughs in various image classification tasks. However, they are computationally expensive, which can make their feasible…
BitStream: Efficient Computing Architecture for Real-Time Low-Power Inference of Binary Neural Networks on CPUs
- Computer ScienceACM Multimedia
- 2018
This work presents a general architecture for efficient binary convolution referred to as BitStream with the latest computation flow for BNNs instead of the traditional row-major im2col based one, and mainly optimize memory access during computation of Bnns.
PXNOR: Perturbative Binary Neural Network
- Computer Science2019 18th RoEduNet Conference: Networking in Education and Research (RoEduNet)
- 2019
PXNOR seeks to fully replace traditional convolutional filters with approximate operations, while replacing all multiplications and additions with simpler, much faster versions such as XNOR and bitcounting, which are implemented at hardware level on all existing platforms.
BitFlow: Exploiting Vector Parallelism for Binary Neural Networks on CPU
- Computer Science2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
- 2018
The proposed BitFlow, a gemm-operator-network three-level optimization framework for fully exploiting the computing power of BNNs on CPU, features a new class of algorithm named PressedConv for efficient binary convolution using locality-aware layout and vector parallelism.
Applying BinaryWeights in Neural Networks for Sequential Text Recognition
- Computer Science2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)
- 2017
A deep learning framework is built that supports training and prediction with binarized neural networks, e.g. Convolution (Conv) and LSTM layers, and a network withbinarized layers is constructed which is implemented in this framework to achieve good performance on sequential text recognition.
3PXNet: Pruned-Permuted-Packed XNOR Networks for Edge Machine Learning
- Computer ScienceACM Trans. Embed. Comput. Syst.
- 2020
3PXNets is the first software implementation of sparse-binarized Neural Networks, released as open source library targeting edge devices and complete with training methodology and model generating scripts, making it easy and fast to deploy.
Training wide residual networks for deployment using a single bit for each weight
- Computer ScienceICLR
- 2018
Using a warm-restart learning-rate schedule, it is found that training for 1-bit-per-weight is just as fast as full-precision networks, with better accuracy than standard schedules, and achieved about 98%-99% of peak performance in just 62 training epochs for CIFAR-10/100.
Memristive Quantized Neural Networks: A Novel Approach to Accelerate Deep Learning On-Chip
- Computer ScienceIEEE Transactions on Cybernetics
- 2021
A novel approach to accelerate on-chip learning systems using memristive quantized neural networks (M-QNNs) is presented, which can significantly reduce computational time and memory during the process of image recognition.
Reducing the Computational Cost of Deep Generative Models with Binary Neural Networks
- Computer ScienceICLR
- 2021
This work shows, for the first time, that it can successfully train generative models which utilize binary neural networks, and trains binary models that achieve loss values close to those of the regular models but are 90%-94% smaller in size, and also allow significant speed-ups in execution time.
Optimizing Bit-Serial Matrix Multiplication for Reconfigurable Computing
- Computer ScienceACM Trans. Reconfigurable Technol. Syst.
- 2019
It is shown how BISMO can be scaled up on Xilinx FPGAs using an arithmetic architecture that better utilizes six-input LUTs, and achieves a peak performance of 15.4 binary TOPS on the Ultra96 board with a XILinx UltraScale+ MPSoC.
References
SHOWING 1-10 OF 33 REFERENCES
BinaryConnect: Training Deep Neural Networks with binary weights during propagations
- Computer ScienceNIPS
- 2015
BinaryConnect is introduced, a method which consists in training a DNN with binary weights during the forward and backward propagations, while retaining precision of the stored weights in which gradients are accumulated, and near state-of-the-art results with BinaryConnect are obtained on the permutation-invariant MNIST, CIFAR-10 and SVHN.
Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
- Computer Science
- 2016
A binary matrix multiplication GPU kernel is written with which it is possible to run the MNIST BNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy.
EIE: Efficient Inference Engine on Compressed Deep Neural Network
- Computer Science2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)
- 2016
An energy efficient inference engine (EIE) that performs inference on this compressed network model and accelerates the resulting sparse matrix-vector multiplication with weight sharing and is 189x and 13x faster when compared to CPU and GPU implementations of the same DNN without compression.
Binarized Neural Networks
- Computer ScienceNIPS
- 2016
A binary matrix multiplication GPU kernel is written with which it is possible to run the MNIST BNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy.
Pruning Filters for Efficient ConvNets
- Computer ScienceICLR
- 2017
This work presents an acceleration method for CNNs, where it is shown that even simple filter pruning techniques can reduce inference costs for VGG-16 and ResNet-110 by up to 38% on CIFAR10 while regaining close to the original accuracy by retraining the networks.
Trained Ternary Quantization
- Computer ScienceICLR
- 2017
This work proposes Trained Ternary Quantization (TTQ), a method that can reduce the precision of weights in neural networks to ternary values to improve the accuracy of some models (32, 44, 56-layer ResNet) on CIFAR-10 and AlexNet on ImageNet.
Dynamic Network Surgery for Efficient DNNs
- Computer ScienceNIPS
- 2016
A novel network compression method called dynamic network surgery, which can remarkably reduce the network complexity by making on-the-fly connection pruning by proving that it outperforms the recent pruning method by considerable margins.
Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding
- Computer ScienceICLR
- 2016
This work introduces "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy.
Training deep neural networks with low precision multiplications
- Computer Science
- 2014
It is found that very low precision is sufficient not just for running trained networks but also for training them, and it is possible to train Maxout networks with 10 bits multiplications.
ImageNet classification with deep convolutional neural networks
- Computer ScienceCommun. ACM
- 2012
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.