• Publications
  • Influence
Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks
Eyeriss is an accelerator for state-of-the-art deep convolutional neural networks (CNNs). It optimizes for the energy efficiency of the entire system, including the accelerator chip and off-chip
Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks
A novel dataflow, called row-stationary (RS), is presented, that minimizes data movement energy consumption on a spatial architecture and can adapt to different CNN shape configurations and reduces all types of data movement through maximally utilizing the processing engine local storage, direct inter-PE communication and spatial parallelism.
High performance cache replacement using re-reference interval prediction (RRIP)
This paper proposes Static RRIP that is scan-resistant and Dynamic RRIP (DRRIP) that is both scan- resistant and thrash-resistant that require only 2-bits per cache block and easily integrate into existing LRU approximations found in modern processors.
Adaptive insertion policies for high performance caching
A Dynamic Insertion Policy (DIP) is proposed to choose between BIP and the traditional LRU policy depending on which policy incurs fewer misses, and shows that DIP reduces the average MPKI of the baseline 1MB 16-way L2 cache by 21%, bridging two-thirds of the gap between LRU and OPT.
A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor
A method for generating per-structure AVF estimates for microprocessor error rates is described, identifying numerous cases, such as prefetches, dynamically dead code, and wrong-path instructions, in which a fault do not affect, correct execution.
Efficient Processing of Deep Neural Networks: A Tutorial and Survey
Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver
SCNN: An accelerator for compressed-sparse convolutional neural networks
The Sparse CNN (SCNN) accelerator architecture is introduced, which improves performance and energy efficiency by exploiting thezero-valued weights that stem from network pruning during training and zero-valued activations that arise from the common ReLU operator.
A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor
This paper identifies numerous cases, such as prefetches, dynamicallydead code, and wrong-path instructions, in which a fault will not affect correct execution, and shows AVFs of 28% and 9% for the instruction queue and execution units, respectively,averaged across dynamic sections of the entire CPU2000benchmark suite.
Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor
This paper presents an architecture for simultaneous multithreading that minimizes the architectural impact on the conventional superscalar design, has minimal performance impact on a single thread executing alone, and achieves significant throughput gains when running multiple threads.