Share This Author
Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks
- Yu-hsin Chen, T. Krishna, J. Emer, V. Sze
- Computer ScienceIEEE Journal of Solid-State Circuits
- 1 February 2016
Eyeriss is an accelerator for state-of-the-art deep convolutional neural networks (CNNs). It optimizes for the energy efficiency of the entire system, including the accelerator chip and off-chip…
Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks
- Yu-hsin Chen, J. Emer, V. Sze
- Computer ScienceACM/IEEE 43rd Annual International Symposium on…
- 1 June 2016
A novel dataflow, called row-stationary (RS), is presented that minimizes data movement energy consumption on a spatial architecture and can adapt to different CNN shape configurations and reduces all types of data movement through maximally utilizing the processing engine local storage, direct inter-PE communication and spatial parallelism.
Efficient Processing of Deep Neural Networks: A Tutorial and Survey
Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver…
Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices
- Yu-hsin Chen, Tien-Ju Yang, J. Emer, V. Sze
- Computer ScienceIEEE Journal on Emerging and Selected Topics in…
- 10 July 2018
Eyeriss v2, a DNN accelerator architecture designed for running compact and sparse DNNs, is presented, which introduces a highly flexible on-chip network that can adapt to the different amounts of data reuse and bandwidth requirements of different data types, which improves the utilization of the computation resources.
NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications
An algorithm that automatically adapts a pre-trained deep neural network to a mobile platform given a resource budget while maximizing the accuracy, and achieves better accuracy versus latency trade-offs on both mobile CPU and mobile GPU, compared with the state-of-the-art automated network simplification algorithms.
Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning
- Tien-Ju Yang, Yu-hsin Chen, V. Sze
- Computer ScienceIEEE Conference on Computer Vision and Pattern…
- 16 November 2016
This work proposes an energy-aware pruning algorithm for CNNs that directly uses the energy consumption of a CNN to guide the pruning process, and shows that reducing the number of target classes in AlexNet greatly decreases thenumber of weights, but has a limited impact on energy consumption.
FastDepth: Fast Monocular Depth Estimation on Embedded Systems
- Diana Wofk, Fangchang Ma, Tien-Ju Yang, S. Karaman, V. Sze
- Computer ScienceInternational Conference on Robotics and…
- 8 March 2019
This paper proposes an efficient and lightweight encoder-decoder network architecture and applies network pruning to further reduce computational complexity and latency and demonstrates real-time monocular depth estimation using a deep neural network with the lowest latency and highest throughput on an embedded platform that can be carried by a micro aerial vehicle.
Core Transform Design in the High Efficiency Video Coding (HEVC) Standard
- M. Budagavi, A. Fuldseth, G. Bjøntegaard, V. Sze, M. Sadafale
- Computer ScienceIEEE Journal of Selected Topics in Signal…
- 20 June 2013
The core transforms specified for the high efficiency video coding (HEVC) standard were designed as finite precision approximations to the discrete cosine transform (DCT) to allow implementation friendliness and is friendly to parallel processing.
High Efficiency Video Coding (HEVC), Algorithms and Architectures
This book provides a detailed explanation of the various parts of the HEVC standard, insight into how it was developed, and in-depth discussion of algorithms and architectures for its implementation.
DeeperLab: Single-Shot Image Parser
The proposed DeeperLab image parser performs whole image parsing with a significantly simpler, fully convolutional approach that jointly addresses the semantic and instance segmentation tasks in a single-shot manner, resulting in a streamlined system that better lends itself to fast processing.