• Corpus ID: 3728338

Espresso: Efficient Forward Propagation for BCNNs

@article{Pedersoli2017EspressoEF,
  title={Espresso: Efficient Forward Propagation for BCNNs},
  author={Fabrizio Pedersoli and George Tzanetakis and Andrea Tagliasacchi},
  journal={ArXiv},
  year={2017},
  volume={abs/1705.07175}
}
There are many applications scenarios for which the computational performance and memory footprint of the prediction phase of Deep Neural Networks (DNNs) needs to be optimized. Binary Neural Networks (BDNNs) have been shown to be an effective way of achieving this objective. In this paper, we show how Convolutional Neural Networks (CNNs) can be implemented using binary representations. Espresso is a compact, yet powerful library written in C/CUDA that features all the functionalities required… 

Figures and Tables from this paper

Binarized Convolutional Neural Networks for Efficient Inference on GPUs
Convolutional neural networks have recently achieved significant breakthroughs in various image classification tasks. However, they are computationally expensive, which can make their feasible
BitStream: Efficient Computing Architecture for Real-Time Low-Power Inference of Binary Neural Networks on CPUs
TLDR
This work presents a general architecture for efficient binary convolution referred to as BitStream with the latest computation flow for BNNs instead of the traditional row-major im2col based one, and mainly optimize memory access during computation of Bnns.
PXNOR: Perturbative Binary Neural Network
  • Vlad Pelin, I. Radoi
  • Computer Science
    2019 18th RoEduNet Conference: Networking in Education and Research (RoEduNet)
  • 2019
TLDR
PXNOR seeks to fully replace traditional convolutional filters with approximate operations, while replacing all multiplications and additions with simpler, much faster versions such as XNOR and bitcounting, which are implemented at hardware level on all existing platforms.
BitFlow: Exploiting Vector Parallelism for Binary Neural Networks on CPU
TLDR
The proposed BitFlow, a gemm-operator-network three-level optimization framework for fully exploiting the computing power of BNNs on CPU, features a new class of algorithm named PressedConv for efficient binary convolution using locality-aware layout and vector parallelism.
Applying BinaryWeights in Neural Networks for Sequential Text Recognition
TLDR
A deep learning framework is built that supports training and prediction with binarized neural networks, e.g. Convolution (Conv) and LSTM layers, and a network withbinarized layers is constructed which is implemented in this framework to achieve good performance on sequential text recognition.
3PXNet: Pruned-Permuted-Packed XNOR Networks for Edge Machine Learning
TLDR
3PXNets is the first software implementation of sparse-binarized Neural Networks, released as open source library targeting edge devices and complete with training methodology and model generating scripts, making it easy and fast to deploy.
Training wide residual networks for deployment using a single bit for each weight
TLDR
Using a warm-restart learning-rate schedule, it is found that training for 1-bit-per-weight is just as fast as full-precision networks, with better accuracy than standard schedules, and achieved about 98%-99% of peak performance in just 62 training epochs for CIFAR-10/100.
Memristive Quantized Neural Networks: A Novel Approach to Accelerate Deep Learning On-Chip
TLDR
A novel approach to accelerate on-chip learning systems using memristive quantized neural networks (M-QNNs) is presented, which can significantly reduce computational time and memory during the process of image recognition.
Reducing the Computational Cost of Deep Generative Models with Binary Neural Networks
TLDR
This work shows, for the first time, that it can successfully train generative models which utilize binary neural networks, and trains binary models that achieve loss values close to those of the regular models but are 90%-94% smaller in size, and also allow significant speed-ups in execution time.
Optimizing Bit-Serial Matrix Multiplication for Reconfigurable Computing
TLDR
It is shown how BISMO can be scaled up on Xilinx FPGAs using an arithmetic architecture that better utilizes six-input LUTs, and achieves a peak performance of 15.4 binary TOPS on the Ultra96 board with a XILinx UltraScale+ MPSoC.
...
...

References

SHOWING 1-10 OF 33 REFERENCES
BinaryConnect: Training Deep Neural Networks with binary weights during propagations
TLDR
BinaryConnect is introduced, a method which consists in training a DNN with binary weights during the forward and backward propagations, while retaining precision of the stored weights in which gradients are accumulated, and near state-of-the-art results with BinaryConnect are obtained on the permutation-invariant MNIST, CIFAR-10 and SVHN.
Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
TLDR
A binary matrix multiplication GPU kernel is written with which it is possible to run the MNIST BNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy.
EIE: Efficient Inference Engine on Compressed Deep Neural Network
  • Song Han, Xingyu Liu, W. Dally
  • Computer Science
    2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)
  • 2016
TLDR
An energy efficient inference engine (EIE) that performs inference on this compressed network model and accelerates the resulting sparse matrix-vector multiplication with weight sharing and is 189x and 13x faster when compared to CPU and GPU implementations of the same DNN without compression.
Binarized Neural Networks
TLDR
A binary matrix multiplication GPU kernel is written with which it is possible to run the MNIST BNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy.
Pruning Filters for Efficient ConvNets
TLDR
This work presents an acceleration method for CNNs, where it is shown that even simple filter pruning techniques can reduce inference costs for VGG-16 and ResNet-110 by up to 38% on CIFAR10 while regaining close to the original accuracy by retraining the networks.
Trained Ternary Quantization
TLDR
This work proposes Trained Ternary Quantization (TTQ), a method that can reduce the precision of weights in neural networks to ternary values to improve the accuracy of some models (32, 44, 56-layer ResNet) on CIFAR-10 and AlexNet on ImageNet.
Dynamic Network Surgery for Efficient DNNs
TLDR
A novel network compression method called dynamic network surgery, which can remarkably reduce the network complexity by making on-the-fly connection pruning by proving that it outperforms the recent pruning method by considerable margins.
Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding
TLDR
This work introduces "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy.
Training deep neural networks with low precision multiplications
TLDR
It is found that very low precision is sufficient not just for running trained networks but also for training them, and it is possible to train Maxout networks with 10 bits multiplications.
ImageNet classification with deep convolutional neural networks
TLDR
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
...
...