Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks
@article{Chen2017EyerissAE, title={Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks}, author={Yu-hsin Chen and Tushar Krishna and Joel S. Emer and Vivienne Sze}, journal={IEEE Journal of Solid-State Circuits}, year={2017}, volume={52}, pages={127-138} }
Eyeriss is an accelerator for state-of-the-art deep convolutional neural networks (CNNs). It optimizes for the energy efficiency of the entire system, including the accelerator chip and off-chip DRAM, for various CNN shapes by reconfiguring the architecture. CNNs are widely used in modern AI systems but also bring challenges on throughput and energy efficiency to the underlying hardware. This is because its computation requires a large amount of data, creating significant data movement from on…
1,198 Citations
COSY: An Energy-Efficient Hardware Architecture for Deep Convolutional Neural Networks Based on Systolic Array
- Computer Science2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS)
- 2017
COSY (CNN on Systolic Array), an energy-efficient hardware architecture based on the systolic array for CNNs, which can achieve an over 15% reduction in energy consumption under the same constraints, and it is proved that COSY has the intrinsic ability for zero-skipping.
CARLA: A Convolution Accelerator With a Reconfigurable and Low-Energy Architecture
- Computer ScienceIEEE Transactions on Circuits and Systems I: Regular Papers
- 2021
This work proposes an energy-efficient architecture equipped with several optimized dataflows to support the structural diversity of modern CNNs and achieves a Processing Element (PE) utilization factor of 98% for the majority of convolutional layers.
An Energy-Efficient and Flexible Accelerator based on Reconfigurable Computing for Multiple Deep Convolutional Neural Networks
- Computer Science2018 14th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT)
- 2018
A novel accelerator, called reconfigurable neural accelerator (RNA), was proposed based on reconfigured computing technology, and image row broadcast (IRB) and zero detection technology (ZDT) were applied for increased energy efficiency and throughput.
Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks
- Computer ScienceCARN
- 2016
A novel dataflow, called row-stationary (RS), is presented, that minimizes data movement energy consumption on a spatial architecture and can adapt to different CNN shape configurations and reduces all types of data movement through maximally utilizing the processing engine local storage, direct inter-PE communication and spatial parallelism.
Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks
- Computer Science2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)
- 2016
A novel dataflow, called row-stationary (RS), that minimizes data movement energy consumption on a spatial architecture and can adapt to different CNN shape configurations and reduces all types of data movement through maximally utilizing the processing engine local storage, direct inter-PE communication and spatial parallelism.
An Energy-Aware Bit-Serial Streaming Deep Convolutional Neural Network Accelerator
- Computer Science2019 IEEE International Conference on Image Processing (ICIP)
- 2019
An energy-aware bit-serial streaming deep CNN accelerator is proposed to tackle the challenges of high computational complexity and excessive data storage and with several optimization methods and the proposed ring streaming dataflow, the computational performance is improved and the external memory access is reduced.
WinDConv: A Fused Datapath CNN Accelerator for Power-Efficient Edge Devices
- Computer ScienceIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
- 2020
The proposed architecture, termed WinDConv, introduces a scheme to support both regular and energy-efficient Winograd convolutions on the same architecture through a fused datapath, and demonstrates the applicability of the proposed schemes in commonly occurring variants of the convolution operation.
YodaNN: An Architecture for Ultralow Power Binary-Weight CNN Acceleration
- Computer ScienceIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
- 2018
This paper presents an accelerator optimized for binary-weight CNNs that significantly outperforms the state-of-the-art in terms of energy and area efficiency and removes the need for expensive multiplications, as well as reducing I/O bandwidth and storage.
Architecture design for highly flexible and energy-efficient deep neural network accelerators
- Computer Science
- 2018
Eyeriss, a co-design of software and hardware architecture for DNN processing that is optimized for performance, energy efficiency and flexibility, features a novel RowStationary dataflow to minimize data movement when processing a DNN, which is the bottleneck of both performance and energy efficiency.
RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM
- Computer Science2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA)
- 2018
A Retention-Aware Neural Acceleration (RANA) framework for CNN accelerators to save total system energy consumption with refresh-optimized eDRAM and remove unnecessary refresh operations is proposed.
References
SHOWING 1-10 OF 38 REFERENCES
Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks
- Computer ScienceCARN
- 2016
A novel dataflow, called row-stationary (RS), is presented, that minimizes data movement energy consumption on a spatial architecture and can adapt to different CNN shape configurations and reduces all types of data movement through maximally utilizing the processing engine local storage, direct inter-PE communication and spatial parallelism.
ShiDianNao: Shifting vision processing closer to the sensor
- Computer Science2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)
- 2015
This paper proposes an accelerator which is 60x more energy efficient than the previous state-of-the-art neural network accelerator, designed down to the layout at 65 nm, with a modest footprint and consuming only 320 mW, but still about 30x faster than high-end GPUs.
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
- Computer ScienceFPGA
- 2015
This work implements a CNN accelerator on a VC707 FPGA board and compares it to previous approaches, achieving a peak performance of 61.62 GFLOPS under 100MHz working frequency, which outperform previous approaches significantly.
DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning
- Computer ScienceASPLOS 2014
- 2014
This study designs an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance and energy, and shows that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s in a small footprint.
A ultra-low-energy convolution engine for fast brain-inspired vision in multicore clusters
- Computer Science2015 Design, Automation & Test in Europe Conference & Exhibition (DATE)
- 2015
This work proposes to augment many-core architectures using shared-memory clusters of power-optimized RISC processors with Hardware Convolution Engines (HWCEs): ultra-low energy coprocessors for accelerating convolutions, the main building block of many brain-inspired computer vision algorithms.
DaDianNao: A Machine-Learning Supercomputer
- Computer Science2014 47th Annual IEEE/ACM International Symposium on Microarchitecture
- 2014
This article introduces a custom multi-chip machine-learning architecture, showing that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip system.
A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks
- Computer Science2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops
- 2014
The nn-X system is presented, a scalable, low-power coprocessor for enabling real-time execution of deep neural networks, able to achieve a peak performance of 227 G-ops/s, which translates to a performance per power improvement of 10 to 100 times that of conventional mobile and desktop processors.
A Massively Parallel Coprocessor for Convolutional Neural Networks
- Computer Science2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors
- 2009
A massively parallel coprocessor for accelerating Convolutional Neural Networks (CNNs), a class of important machine learning algorithms, is presented, which uses low precision data and further increase the effective memory bandwidth by packing multiple words in every memory operation.
Origami: A Convolutional Network Accelerator
- Computer ScienceACM Great Lakes Symposium on VLSI
- 2015
This paper presents the first convolutional network accelerator which is scalable to network sizes that are currently only handled by workstation GPUs, but remains within the power envelope of embedded systems.
14.6 A 1.42TOPS/W deep convolutional neural network recognition processor for intelligent IoE systems
- Computer Science2016 IEEE International Solid-State Circuits Conference (ISSCC)
- 2016
In this paper, we present an energy-efficient CNN processor with 4 key features: (1) a CNN-optimized neuron processing engine (NPE), (2) a dual-range multiplyaccumulate (DRMAC) block for low-power…