Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine

@article{Andri2019HyperdriveAM,
  title={Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine},
  author={Renzo Andri and L. Cavigelli and D. Rossi and L. Benini},
  journal={IEEE Journal on Emerging and Selected Topics in Circuits and Systems},
  year={2019},
  volume={9},
  pages={309-322}
}
Deep neural networks have achieved impressive results in computer vision and machine learning. [...] Key Method We present Hyperdrive: a BWN accelerator dramatically reducing the I/O bandwidth exploiting a novel binary-weight streaming approach, which can be used for an arbitrarily sized convolutional neural network architecture and input resolution by exploiting the natural scalability of the compute units both at chip-level and system-level by arranging Hyperdrive chips systolically in a 2D mesh while…Expand
EBPC: Extended Bit-Plane Compression for Deep Neural Network Inference and Training Accelerators
TLDR
This work introduces and evaluates a novel, hardware-friendly, and lossless compression scheme for the feature maps present within convolutional neural networks, and achieves compression factors for gradient map compression during training that are even better than for inference. Expand
Optimizing Temporal Convolutional Network Inference on FPGA-Based Accelerators
TLDR
This paper proposes a convolution scheduling based on batch processing that can boost efficiency up to 96% of theoretical peak performance, and considers a CNN accelerator with specific features supporting TCN kernels as a reference and a set of state-of-the-art TCNs as a benchmark. Expand
Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead
TLDR
This work summarizes and compares the works for four leading platforms for the execution of algorithms such as CPU, GPU, FPGA and ASIC describing the main solutions of the state-of-the-art, giving much prominence to the last two solutions since they offer greater design flexibility and bear the potential of high energy-efficiency, especially for the inference process. Expand
A Configurable and Versatile Architecture for Low Power, Energy Efficient Hardware Acceleration of Convolutional Neural Networks
TLDR
This paper presents a configurable, versatile and flexible architecture for hardware acceleration of convolutional neural networks (CNNs) that is based on storing and accumulating entire feature maps in local memory inside the accelerator, achieving an improved energy efficiency of over a factor 5 for select CNN layers. Expand
A Construction Kit for Efficient Low Power Neural Network Accelerator Designs
TLDR
This work provides a survey of neural network accelerator optimization approaches that have been used in recent works and reports their individual effects on edge processing performance, presenting the list of optimizations and their quantitative effects as a construction kit, allowing to assess the design choices for each building block separately. Expand
EEG-TCNet: An Accurate Temporal Convolutional Network for Embedded Motor-Imagery Brain–Machine Interfaces
TLDR
This paper proposes EEG-TCNet, a novel temporal convolutional network (TCN) that achieves outstanding accuracy while requiring few trainable parameters, which makes it suitable for embedded classification on resource-limited devices at the edge. Expand
FANN-on-MCU: An Open-Source Toolkit for Energy-Efficient Neural Network Inference at the Edge of the Internet of Things
TLDR
A FANN-on-MCU, an open-source toolkit built upon the fast artificial neural network (FANN) library to run lightweight and energy-efficient neural networks on microcontrollers based on both the ARM Cortex-M series and the novel RISC-V-based parallel ultralow-power (PULP) platform is presented. Expand
RPR: Random Partition Relaxation for Training; Binary and Ternary Weight Neural Networks
TLDR
Random Partition Relaxation (RPR) is presented, a method for strong quantization of neural networks weight to binary (+1/-1) and ternary (+1/0/1) values and an SGD-based training method that can be integrated into existing frameworks. Expand
Algorithm and VLSI Architecture Designs of a Lossless Embedded Compression Encoder for HD Video Coding Systems
The demand for visual quality has been advanced by high display resolutions and frame rates. Nevertheless, these two issues have caused tremendous memory bandwidth in a video coding system. In this...
ChewBaccaNN: A Flexible 223 TOPS/W BNN Accelerator
TLDR
ChewBaccaNN is presented, a 0.7 mm2 sized binary convolutional neural network (CNN) accelerator designed in GlobalFoundries 22 nm technology that can perform CIFAR-10 inference at 86.8% accuracy and perform inference on a binarized ResNet-18 trained with 8-bases Group-Net to achieve a 67.5% Top-1 accuracy. Expand
...
1
2
...

References

SHOWING 1-10 OF 62 REFERENCES
Hyperdrive: A Systolically Scalable Binary-Weight CNN Inference Engine for mW IoT End-Nodes
TLDR
Hyperdrive is presented: a BWN accelerator dramatically reducing the I/O bandwidth exploiting a novel binary-weight streaming approach, and capable of handling high-resolution images by virtue of its systolic-scalable architecture. Expand
YodaNN: An Architecture for Ultralow Power Binary-Weight CNN Acceleration
TLDR
This paper presents an accelerator optimized for binary-weight CNNs that significantly outperforms the state-of-the-art in terms of energy and area efficiency and removes the need for expensive multiplications, as well as reducing I/O bandwidth and storage. Expand
NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps
TLDR
A flexible and efficient CNN accelerator architecture called NullHop that implements SOA CNNs useful for low-power and low-latency application scenarios and exploits the sparsity of neuron activations in CNNs to accelerate the computation and reduce memory requirements is proposed. Expand
XNORBIN: A 95 TOp/s/W hardware accelerator for binary convolutional neural networks
TLDR
XNORBIN is presented, a flexible accelerator for binary CNNs with computation tightly coupled to memory for aggressive data reuse supporting even non-trivial network topologies with large feature map volumes. Expand
Extended Bit-Plane Compression for Convolutional Neural Network Accelerators
  • L. Cavigelli, L. Benini
  • Computer Science
  • 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)
  • 2019
TLDR
This work introduces and evaluates a novel, hardware-friendly compression scheme for the feature maps present within convolutional neural networks and shows that an average compression ratio of 4.4× relative to uncompressed data and a gain of 60% over existing method can be achieved for ResNet-34 with a compression block requiring <300 bit of sequential cells and minimal combinational logic. Expand
EIE: Efficient Inference Engine on Compressed Deep Neural Network
  • Song Han, Xingyu Liu, +4 authors W. Dally
  • Computer Science
  • 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)
  • 2016
TLDR
An energy efficient inference engine (EIE) that performs inference on this compressed network model and accelerates the resulting sparse matrix-vector multiplication with weight sharing and is 189x and 13x faster when compared to CPU and GPU implementations of the same DNN without compression. Expand
An Energy-Efficient Architecture for Binary Weight Convolutional Neural Networks
TLDR
An innovative approximate adder is developed, which significantly reduces the silicon area and data path delay, and algorithmic transformations for certain layers of BCNNs and a memory-efficient quantization scheme are incorporated to further reduce the energy cost and on-chip storage requirement. Expand
ShiDianNao: Shifting vision processing closer to the sensor
TLDR
This paper proposes an accelerator which is 60x more energy efficient than the previous state-of-the-art neural network accelerator, designed down to the layout at 65 nm, with a modest footprint and consuming only 320 mW, but still about 30x faster than high-end GPUs. Expand
A ultra-low-energy convolution engine for fast brain-inspired vision in multicore clusters
TLDR
This work proposes to augment many-core architectures using shared-memory clusters of power-optimized RISC processors with Hardware Convolution Engines (HWCEs): ultra-low energy coprocessors for accelerating convolutions, the main building block of many brain-inspired computer vision algorithms. Expand
14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks
TLDR
To achieve state-of-the-art accuracy, CNNs with not only a larger number of layers, but also millions of filters weights, and varying shapes are needed, which results in substantial data movement, which consumes significant energy. Expand
...
1
2
3
4
5
...