YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights

@article{Andri2016YodaNNAU,
  title={YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights},
  author={Renzo Andri and Lukas Cavigelli and Davide Rossi and Luca Benini},
  journal={2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)},
  year={2016},
  pages={236-241}
}
Convolutional Neural Networks (CNNs) have revolutionized the world of image classification over the last few years, pushing the computer vision close beyond human accuracy. [...] Key Method This novel algorithm approach brings major optimization opportunities in the arithmetic core by removing the need for the expensive multiplications as well as in the weight storage and I/O costs. In this work, we present a HW accelerator optimized for BinaryConnect CNNs that achieves 1510 GOp/s on a corearea of only 1.33 MGE…Expand
A Convolutional Accelerator for Neural Networks With Binary Weights
TLDR
A convolutional accelerator for binary-weight neural networks is introduced, which achieves a high area-efficiency of 176 Gops/MGC and performance efficiency of 89%, outperforming the state-of-the-art architecture forbinary-weight networks by 1.8× and 3.2×, respectively. Expand
CARLA: A Convolution Accelerator With a Reconfigurable and Low-Energy Architecture
TLDR
This work proposes an energy-efficient architecture equipped with several optimized dataflows to support the structural diversity of modern CNNs and achieves a Processing Element (PE) utilization factor of 98% for the majority of convolutional layers. Expand
Towards energy-efficient convolutional neural network inference
TLDR
This thesis first evaluates the capabilities of off-the-shelf software-programmable hardware before diving into specialized hardware accelerators and exploring the potential of extremely quantized CNNs, and gives special consideration to external memory bandwidth. Expand
Fast and Efficient Convolutional Accelerator for Edge Computing
TLDR
ZASCA achieves a performance efficiency of up to 94 percent over a set of state-of-the-art CNNs for image classification with dense representation where the performance efficiency is the ratio between the average runtime performance and the peak performance. Expand
A Survey of Field Programmable Gate Array-Based Convolutional Neural Network Accelerators
  • Wei Zhang
  • 2020
With the rapid development of deep learning, neural network and deep learning algorithms play a significant role in various practical applications. Due to the high accuracy and good performance,Expand
An Energy-Efficient Accelerator Architecture with Serial Accumulation Dataflow for Deep CNNs
TLDR
This paper proposes an energy-efficient architecture which maximally utilizes its computational units for convolution operations while requiring a low number of DRAM accesses and results show that the proposed architecture performs one image recognition task using the VGGNet model with a latency of 393 ms and only 251.5 MB of DR AM accesses. Expand
A high utilization FPGA-based accelerator for variable-scale convolutional neural network
TLDR
An optimization framework to solve boundary problem and connect the accelerator with ARM processors and DDR4 memory through dual Advanced eXtensible Interface (AXI) bus is proposed. Expand
Extended Bit-Plane Compression for Convolutional Neural Network Accelerators
  • L. Cavigelli, L. Benini
  • Computer Science
  • 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)
  • 2019
TLDR
This work introduces and evaluates a novel, hardware-friendly compression scheme for the feature maps present within convolutional neural networks and shows that an average compression ratio of 4.4× relative to uncompressed data and a gain of 60% over existing method can be achieved for ResNet-34 with a compression block requiring <300 bit of sequential cells and minimal combinational logic. Expand
EBPC: Extended Bit-Plane Compression for Deep Neural Network Inference and Training Accelerators
TLDR
This work introduces and evaluates a novel, hardware-friendly, and lossless compression scheme for the feature maps present within convolutional neural networks, and achieves compression factors for gradient map compression during training that are even better than for inference. Expand
Efficient Hardware Architectures for Deep Convolutional Neural Network
TLDR
The theoretical derivation of parallel fast finite impulse response algorithm (FFA) is introduced and the corresponding fast convolution units (FCUs) are developed for the computation of convolutions in the CNN models. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 43 REFERENCES
Accelerating Deep Convolutional Neural Networks Using Specialized Hardware
TLDR
Hardware specialization in the form of GPGPUs, FPGAs, and ASICs offers a promising path towards major leaps in processing capability while achieving high energy efficiency, and combining multiple FPGA over a low-latency communication fabric offers further opportunity to train and evaluate models of unprecedented size and quality. Expand
ShiDianNao: Shifting vision processing closer to the sensor
TLDR
This paper proposes an accelerator which is 60x more energy efficient than the previous state-of-the-art neural network accelerator, designed down to the layout at 65 nm, with a modest footprint and consuming only 320 mW, but still about 30x faster than high-end GPUs. Expand
Hardware-oriented Approximation of Convolutional Neural Networks
TLDR
Ristretto is a model approximation framework that analyzes a given CNN with respect to numerical resolution used in representing weights and outputs of convolutional and fully connected layers and can condense models by using fixed point arithmetic and representation instead of floating point. Expand
Origami: A Convolutional Network Accelerator
TLDR
This paper presents the first convolutional network accelerator which is scalable to network sizes that are currently only handled by workstation GPUs, but remains within the power envelope of embedded systems. Expand
14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks
TLDR
To achieve state-of-the-art accuracy, CNNs with not only a larger number of layers, but also millions of filters weights, and varying shapes are needed, which results in substantial data movement, which consumes significant energy. Expand
Origami: A 803-GOp/s/W Convolutional Network Accelerator
  • L. Cavigelli, L. Benini
  • Computer Science
  • IEEE Transactions on Circuits and Systems for Video Technology
  • 2017
TLDR
A new architecture, design, and implementation, as well as the first reported silicon measurements of such an accelerator, outperforming previous work in terms of power, area, and I/O efficiency are presented. Expand
DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning
TLDR
This study designs an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance and energy, and shows that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s in a small footprint. Expand
A ultra-low-energy convolution engine for fast brain-inspired vision in multicore clusters
TLDR
This work proposes to augment many-core architectures using shared-memory clusters of power-optimized RISC processors with Hardware Convolution Engines (HWCEs): ultra-low energy coprocessors for accelerating convolutions, the main building block of many brain-inspired computer vision algorithms. Expand
BinaryConnect: Training Deep Neural Networks with binary weights during propagations
TLDR
BinaryConnect is introduced, a method which consists in training a DNN with binary weights during the forward and backward propagations, while retaining precision of the stored weights in which gradients are accumulated, and near state-of-the-art results with BinaryConnect are obtained on the permutation-invariant MNIST, CIFAR-10 and SVHN. Expand
Energy-efficient ConvNets through approximate computing
TLDR
Methods based on approximate computing to reduce energy consumption in state-of-the-art ConvNet accelerators are proposed and can gain energy in the systems arithmetic: up to 30× without losing classification accuracy and more than 100× at 99% classification accuracy, compared to the commonly used 16-bit fixed point number format. Expand
...
1
2
3
4
5
...