• Corpus ID: 207882941

Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks

@article{Chen2017EyerissAE,
  title={Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks},
  author={Yu-hsin Chen and Tushar Krishna and Joel S. Emer and Vivienne Sze},
  journal={IEEE Journal of Solid-State Circuits},
  year={2017},
  volume={52},
  pages={127-138}
}
Eyeriss is an accelerator for state-of-the-art deep convolutional neural networks (CNNs). It optimizes for the energy efficiency of the entire system, including the accelerator chip and off-chip DRAM, for various CNN shapes by reconfiguring the architecture. CNNs are widely used in modern AI systems but also bring challenges on throughput and energy efficiency to the underlying hardware. This is because its computation requires a large amount of data, creating significant data movement from on… 
COSY: An Energy-Efficient Hardware Architecture for Deep Convolutional Neural Networks Based on Systolic Array
  • Chen Xin, Qiang Chen, Bo Wang
  • Computer Science
    2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS)
  • 2017
TLDR
COSY (CNN on Systolic Array), an energy-efficient hardware architecture based on the systolic array for CNNs, which can achieve an over 15% reduction in energy consumption under the same constraints, and it is proved that COSY has the intrinsic ability for zero-skipping.
CARLA: A Convolution Accelerator With a Reconfigurable and Low-Energy Architecture
TLDR
This work proposes an energy-efficient architecture equipped with several optimized dataflows to support the structural diversity of modern CNNs and achieves a Processing Element (PE) utilization factor of 98% for the majority of convolutional layers.
An Energy-Efficient and Flexible Accelerator based on Reconfigurable Computing for Multiple Deep Convolutional Neural Networks
TLDR
A novel accelerator, called reconfigurable neural accelerator (RNA), was proposed based on reconfigured computing technology, and image row broadcast (IRB) and zero detection technology (ZDT) were applied for increased energy efficiency and throughput.
Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks
TLDR
A novel dataflow, called row-stationary (RS), is presented, that minimizes data movement energy consumption on a spatial architecture and can adapt to different CNN shape configurations and reduces all types of data movement through maximally utilizing the processing engine local storage, direct inter-PE communication and spatial parallelism.
Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks
  • Yu-hsin Chen, J. Emer, V. Sze
  • Computer Science
    2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)
  • 2016
TLDR
A novel dataflow, called row-stationary (RS), that minimizes data movement energy consumption on a spatial architecture and can adapt to different CNN shape configurations and reduces all types of data movement through maximally utilizing the processing engine local storage, direct inter-PE communication and spatial parallelism.
An Energy-Aware Bit-Serial Streaming Deep Convolutional Neural Network Accelerator
TLDR
An energy-aware bit-serial streaming deep CNN accelerator is proposed to tackle the challenges of high computational complexity and excessive data storage and with several optimization methods and the proposed ring streaming dataflow, the computational performance is improved and the external memory access is reduced.
WinDConv: A Fused Datapath CNN Accelerator for Power-Efficient Edge Devices
TLDR
The proposed architecture, termed WinDConv, introduces a scheme to support both regular and energy-efficient Winograd convolutions on the same architecture through a fused datapath, and demonstrates the applicability of the proposed schemes in commonly occurring variants of the convolution operation.
YodaNN: An Architecture for Ultralow Power Binary-Weight CNN Acceleration
TLDR
This paper presents an accelerator optimized for binary-weight CNNs that significantly outperforms the state-of-the-art in terms of energy and area efficiency and removes the need for expensive multiplications, as well as reducing I/O bandwidth and storage.
Architecture design for highly flexible and energy-efficient deep neural network accelerators
TLDR
Eyeriss, a co-design of software and hardware architecture for DNN processing that is optimized for performance, energy efficiency and flexibility, features a novel RowStationary dataflow to minimize data movement when processing a DNN, which is the bottleneck of both performance and energy efficiency.
RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM
TLDR
A Retention-Aware Neural Acceleration (RANA) framework for CNN accelerators to save total system energy consumption with refresh-optimized eDRAM and remove unnecessary refresh operations is proposed.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 38 REFERENCES
Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks
TLDR
A novel dataflow, called row-stationary (RS), is presented, that minimizes data movement energy consumption on a spatial architecture and can adapt to different CNN shape configurations and reduces all types of data movement through maximally utilizing the processing engine local storage, direct inter-PE communication and spatial parallelism.
ShiDianNao: Shifting vision processing closer to the sensor
TLDR
This paper proposes an accelerator which is 60x more energy efficient than the previous state-of-the-art neural network accelerator, designed down to the layout at 65 nm, with a modest footprint and consuming only 320 mW, but still about 30x faster than high-end GPUs.
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
TLDR
This work implements a CNN accelerator on a VC707 FPGA board and compares it to previous approaches, achieving a peak performance of 61.62 GFLOPS under 100MHz working frequency, which outperform previous approaches significantly.
DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning
TLDR
This study designs an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance and energy, and shows that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s in a small footprint.
A ultra-low-energy convolution engine for fast brain-inspired vision in multicore clusters
TLDR
This work proposes to augment many-core architectures using shared-memory clusters of power-optimized RISC processors with Hardware Convolution Engines (HWCEs): ultra-low energy coprocessors for accelerating convolutions, the main building block of many brain-inspired computer vision algorithms.
DaDianNao: A Machine-Learning Supercomputer
  • Yunji Chen, Tao Luo, O. Temam
  • Computer Science
    2014 47th Annual IEEE/ACM International Symposium on Microarchitecture
  • 2014
TLDR
This article introduces a custom multi-chip machine-learning architecture, showing that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip system.
A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks
TLDR
The nn-X system is presented, a scalable, low-power coprocessor for enabling real-time execution of deep neural networks, able to achieve a peak performance of 227 G-ops/s, which translates to a performance per power improvement of 10 to 100 times that of conventional mobile and desktop processors.
A Massively Parallel Coprocessor for Convolutional Neural Networks
  • M. Sankaradass, V. Jakkula, H. Graf
  • Computer Science
    2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors
  • 2009
TLDR
A massively parallel coprocessor for accelerating Convolutional Neural Networks (CNNs), a class of important machine learning algorithms, is presented, which uses low precision data and further increase the effective memory bandwidth by packing multiple words in every memory operation.
Origami: A Convolutional Network Accelerator
TLDR
This paper presents the first convolutional network accelerator which is scalable to network sizes that are currently only handled by workstation GPUs, but remains within the power envelope of embedded systems.
14.6 A 1.42TOPS/W deep convolutional neural network recognition processor for intelligent IoE systems
In this paper, we present an energy-efficient CNN processor with 4 key features: (1) a CNN-optimized neuron processing engine (NPE), (2) a dual-range multiplyaccumulate (DRMAC) block for low-power
...
1
2
3
4
...