Hyperdrive: A Systolically Scalable Binary-Weight CNN Inference Engine for mW IoT End-Nodes

@article{Andri2018HyperdriveAS,
  title={Hyperdrive: A Systolically Scalable Binary-Weight CNN Inference Engine for mW IoT End-Nodes},
  author={Renzo Andri and L. Cavigelli and D. Rossi and L. Benini},
  journal={2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)},
  year={2018},
  pages={509-515}
}
Deep neural networks have achieved impressive results in computer vision and machine learning. Unfortunately, state-of-the-art networks are extremely compute-and memory-intensive which makes them unsuitable for mW-devices such as IoT end-nodes. Aggressive quantization of these networks dramatically reduces the computation and memory footprint. Binary-weight neural networks (BWNs) follow this trend, pushing weight quantization to the limit. Hardware accelerators for BWNs presented up to now have… Expand
Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine
TLDR
Hyperdrive is presented: a BWN accelerator dramatically reducing the I/O bandwidth exploiting a novel binary-weight streaming approach, which can be used for an arbitrarily sized convolutional neural network architecture and input resolution by exploiting the natural scalability of the compute units both at chip-level and system-level. Expand
Towards energy-efficient convolutional neural network inference
TLDR
This thesis first evaluates the capabilities of off-the-shelf software-programmable hardware before diving into specialized hardware accelerators and exploring the potential of extremely quantized CNNs, and gives special consideration to external memory bandwidth. Expand
EBPC: Extended Bit-Plane Compression for Deep Neural Network Inference and Training Accelerators
TLDR
This work introduces and evaluates a novel, hardware-friendly, and lossless compression scheme for the feature maps present within convolutional neural networks, and achieves compression factors for gradient map compression during training that are even better than for inference. Expand
Accelerating Inference of Convolutional Neural Networks Using In-memory Computing
TLDR
This work focuses on application-specific, IMC hardware for inference of Convolution Neural Networks (CNNs), and provides methodologies for implementing the various architectural components of the IMC core, and presents methods for mapping synaptic weights and activations on the memory structures. Expand
A Systolic SNN Inference Accelerator and its Co-optimized Software Framework
  • Shasha Guo, Lei Wang, +5 authors Qiang Dou
  • Computer Science
  • ACM Great Lakes Symposium on VLSI
  • 2019
TLDR
A low power hardware accelerator for SNN inference using systolic array, and a corresponding software framework for optimization, inspired by explorations of SNN are presented. Expand
Efficient Pipelined Execution of CNNs Based on In-Memory Computing and Graph Homomorphism Verification
TLDR
This work shows that this communication fabric facilitates the pipelined execution of all state-of-the-art CNNs by proving the existence of a homomorphism between the graph representations of these networks and that corresponding to the proposed communication fabric. Expand
An On-the-Fly Feature Map Compression Engine for Background Memory Access Cost Reduction in DNN Inference
TLDR
A way is proposed to integrate the EBPC hardware blocks, which perform on-the-fly compression and decompression on 8-bit feature map streams, into an embedded ultra-low-power processing system and show how the challenges arising from a variablelength compressed representation can be navigated in this context. Expand
Extended Bit-Plane Compression for Convolutional Neural Network Accelerators
  • L. Cavigelli, L. Benini
  • Computer Science
  • 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)
  • 2019
TLDR
This work introduces and evaluates a novel, hardware-friendly compression scheme for the feature maps present within convolutional neural networks and shows that an average compression ratio of 4.4× relative to uncompressed data and a gain of 60% over existing method can be achieved for ResNet-34 with a compression block requiring <300 bit of sequential cells and minimal combinational logic. Expand
Accurate deep neural network inference using computational phase-change memory
TLDR
This work introduces a methodology to train ResNet-type convolutional neural networks which results in reduced accuracy loss when transferring weights to in-memory computing hardware based on phase-change memory. Expand
Mapping and virtual neuron assignment algorithms for MAERI accelerator
TLDR
This work proposes an algorithm for mapping and assigning virtual neurons (VNs) on the MAERI accelerator to improve its performance and cost and helps to support different DNN trained models and increases DLA-based systems' flexibility. Expand
...
1
2
...

References

SHOWING 1-10 OF 45 REFERENCES
YodaNN: An Architecture for Ultralow Power Binary-Weight CNN Acceleration
TLDR
This paper presents an accelerator optimized for binary-weight CNNs that significantly outperforms the state-of-the-art in terms of energy and area efficiency and removes the need for expensive multiplications, as well as reducing I/O bandwidth and storage. Expand
NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps
TLDR
A flexible and efficient CNN accelerator architecture called NullHop that implements SOA CNNs useful for low-power and low-latency application scenarios and exploits the sparsity of neuron activations in CNNs to accelerate the computation and reduce memory requirements is proposed. Expand
XNORBIN: A 95 TOp/s/W hardware accelerator for binary convolutional neural networks
TLDR
XNORBIN is presented, a flexible accelerator for binary CNNs with computation tightly coupled to memory for aggressive data reuse supporting even non-trivial network topologies with large feature map volumes. Expand
An Energy-Efficient Architecture for Binary Weight Convolutional Neural Networks
TLDR
An innovative approximate adder is developed, which significantly reduces the silicon area and data path delay, and algorithmic transformations for certain layers of BCNNs and a memory-efficient quantization scheme are incorporated to further reduce the energy cost and on-chip storage requirement. Expand
EIE: Efficient Inference Engine on Compressed Deep Neural Network
  • Song Han, Xingyu Liu, +4 authors W. Dally
  • Computer Science
  • 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)
  • 2016
TLDR
An energy efficient inference engine (EIE) that performs inference on this compressed network model and accelerates the resulting sparse matrix-vector multiplication with weight sharing and is 189x and 13x faster when compared to CPU and GPU implementations of the same DNN without compression. Expand
ShiDianNao: Shifting vision processing closer to the sensor
TLDR
This paper proposes an accelerator which is 60x more energy efficient than the previous state-of-the-art neural network accelerator, designed down to the layout at 65 nm, with a modest footprint and consuming only 320 mW, but still about 30x faster than high-end GPUs. Expand
A ultra-low-energy convolution engine for fast brain-inspired vision in multicore clusters
TLDR
This work proposes to augment many-core architectures using shared-memory clusters of power-optimized RISC processors with Hardware Convolution Engines (HWCEs): ultra-low energy coprocessors for accelerating convolutions, the main building block of many brain-inspired computer vision algorithms. Expand
Origami: A 803-GOp/s/W Convolutional Network Accelerator
  • L. Cavigelli, L. Benini
  • Computer Science
  • IEEE Transactions on Circuits and Systems for Video Technology
  • 2017
TLDR
A new architecture, design, and implementation, as well as the first reported silicon measurements of such an accelerator, outperforming previous work in terms of power, area, and I/O efficiency are presented. Expand
Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning
TLDR
This work proposes an energy-aware pruning algorithm for CNNs that directly uses the energy consumption of a CNN to guide the pruning process, and shows that reducing the number of target classes in AlexNet greatly decreases thenumber of weights, but has a limited impact on energy consumption. Expand
Accelerating Deep Convolutional Networks using low-precision and sparsity
TLDR
This work achieves the highest reported accuracy with extremely low-precision (2-bit) weight networks and builds a deep learning accelerator core, DLAC, that can achieve up to 1 TFLOP/mm2 equivalent for single- Precision floating-point operations. Expand
...
1
2
3
4
5
...