• Publications
  • Influence
The gem5 simulator
TLDR
The high level of collaboration on the gem5 project, combined with the previous success of the component parts and a liberal BSD-like license, make gem5 a valuable full-system simulation tool.
Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks
Eyeriss is an accelerator for state-of-the-art deep convolutional neural networks (CNNs). It optimizes for the energy efficiency of the entire system, including the accelerator chip and off-chip
GARNET: A detailed on-chip network model inside a full-system simulator
TLDR
A detailed cycle-accurate interconnection network model (GARNET) is developed, inside the GEMS full-system simulation framework, that provides a detailed and accurate memory system timing model and shows that in improving on-chip network latency-throughput, EVCs do lead to better overall system runtime, however, the impact varies widely across applications.
Breaking the on-chip latency barrier using SMART
TLDR
This work proposes an on-chip network called SMART (Single-cycle Multi-hop Asynchronous Repeated Traversal) that aims to present a single-cycle data-path all the way from the source to the destination.
MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects
TLDR
MAERI is a DNN accelerator built with a set of modular and configurable building blocks that can easily support myriad DNN partitions and mappings by appropriately configuring tiny switches and provides 8-459% better utilization across multiple dataflow mappings over baselines with rigid NoC fabrics.
SCORPIO: A 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering
  • B. Daya, C. Chen, +6 authors L. Peh
  • Computer Science
    ACM/IEEE 41st International Symposium on Computer…
  • 16 October 2014
TLDR
SCORPIO is presented, an ordered mesh Network-on-Chip (NoC) architecture with a separate fixed-latency, bufferless network to achieve distributed global ordering, designed to plug-and-play with existing multicore IP and with practicality, timing, area, and power as top concerns.
SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training
TLDR
SIGMA is proposed, a flexible and scalable architecture that offers high utilization of all its processing elements (PEs) regardless of kernel shape and sparsity, and includes a novel reduction tree microarchitecture named Forwarding Adder Network (FAN).
14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks
TLDR
To achieve state-of-the-art accuracy, CNNs with not only a larger number of layers, but also millions of filters weights, and varying shapes are needed, which results in substantial data movement, which consumes significant energy.
14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks
TLDR
To achieve state-of-the-art accuracy, CNNs with not only a larger number of layers, but also millions of filters weights, and varying shapes are needed, which results in substantial data movement, which consumes significant energy.
SMART: A single-cycle reconfigurable NoC for SoC applications
TLDR
The heart of the SMART NoC is a novel low-swing clockless repeated link circuit embedded within the router crossbars, that allows packets to potentially bypass all the way from source to destination core within a single clock cycle, without being latched at any intermediate router.
...
1
2
3
4
5
...