A Customized NoC Architecture to Enable Highly Localized Computing-on-the-Move DNN Dataflow
@article{Zhou2021ACN, title={A Customized NoC Architecture to Enable Highly Localized Computing-on-the-Move DNN Dataflow}, author={Kaining Zhou and Yangshuo He and Rui Xiao and Jiayi Liu and Kejie Huang}, journal={IEEE Transactions on Circuits and Systems II: Express Briefs}, year={2021}, volume={69}, pages={1692-1696} }
The ever-increasing computation complexity of fast-growing Deep Neural Networks (DNNs) has requested new computing paradigms to overcome the memory wall in conventional Von Neumann computing architectures. The emerging Computing-In-Memory (CIM) architecture has been a promising candidate to accelerate neural network computing. However, data movement between CIM arrays may still dominate the total power consumption in conventional designs. This brief proposes a flexible CIM processor…
One Citation
In-Network Accumulation: Extending the Role of NoC for DNN Acceleration
- Computer Science2022 IEEE 35th International System-on-Chip Conference (SOCC)
- 2022
The In-Network Accumulation (INA) method is proposed to further accelerate a DNN workload execution on a many-core spatial DNN accelerator for the Weight Stationary (WS) dataflow model by expanding the router’s function to support partial sum accumulation.
References
SHOWING 1-10 OF 17 REFERENCES
A Programmable Neural-Network Inference Accelerator Based on Scalable In-Memory Computing
- Computer Science2021 IEEE International Solid- State Circuits Conference (ISSCC)
- 2021
This paper presents a scalable neural-network inference accelerator in 16nm, based on an array of programmable cores employing mixed-signal In-Memory Computing, digital Near-Memory computing, and localized buffering/control, to overcome overheads of HW virtualization.
Cycle-Accurate Network on Chip Simulation with Noxim
- Computer ScienceACM Trans. Model. Comput. Simul.
- 2016
Noxim is presented, an open, configurable, extendible, cycle-accurate NoC simulator developed in SystemC, which allows to analyze the performance and power figures of both conventional wired NoC and emerging WiNoC architectures.
An 89TOPS/W and 16.3TOPS/mm2 All-Digital SRAM-Based Full-Precision Compute-In Memory Macro in 22nm for Machine-Learning Edge Applications
- Computer Science2021 IEEE International Solid- State Circuits Conference (ISSCC)
- 2021
CIM research is focused on a more analog approach with high-energy efficiency; however, lack of accuracy, due to a low SNR, is the main disadvantage; therefore, an analog approach may not be suitable for some applications that require high accuracy.
CASCADE: Connecting RRAMs to Extend Analog Dataflow In An End-To-End In-Memory Processing Paradigm
- Computer ScienceMICRO
- 2019
This work demonstrates the CASCADE architecture that connects multiply-accumulate RRAM arrays with buffer R RAM arrays to extend the processing in analog and in memory: dot products are followed by partial-sum buffering and accumulation to implement a complete DNN or RNN layer.
29.1 A 40nm 64Kb 56.67TOPS/W Read-Disturb-Tolerant Compute-in-Memory/Digital RRAM Macro with Active-Feedback-Based Read and In-Situ Write Verification
- Computer Science2021 IEEE International Solid- State Circuits Conference (ISSCC)
- 2021
A 64Kb RRAM macro supporting a programmable number of row-accesses to enable vector-matrix multiplication for a target algorithm-level inference-accuracy and in-situ WR verification to enable a tight resistance distribution is presented.
14.3 A 65nm Computing-in-Memory-Based CNN Processor with 2.9-to-35.8TOPS/W System Energy Efficiency Using Dynamic-Sparsity Performance-Scaling Architecture and Energy-Efficient Inter/Intra-Macro Data Reuse
- Computer Science2020 IEEE International Solid- State Circuits Conference - (ISSCC)
- 2020
For a high compression rate and high efficiency, the granularity of sparsity needs to be explored based on CIM characteristics, and system-level weight mapping to a CIM Macro and data-reuse strategies are not well explored - these directions are important for CIM macro utilization and energy efficiency.
Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices
- Computer ScienceIEEE Journal on Emerging and Selected Topics in Circuits and Systems
- 2019
Eyeriss v2, a DNN accelerator architecture designed for running compact and sparse DNNs, is presented, which introduces a highly flexible on-chip network that can adapt to the different amounts of data reuse and bandwidth requirements of different data types, which improves the utilization of the computation resources.
AtomLayer: A Universal ReRAM-Based CNN Accelerator with Atomic Layer Computation
- Computer Science2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)
- 2018
AtomLayer is proposed–a universal ReRAM-based accelerator to support both efficient CNN training and inference and can achieve higher power efficiency than ISSAC in inference and PipeLayer in training, meanwhile reducing the footprint by 15 ×.
Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks
- Computer ScienceCARN
- 2016
A novel dataflow, called row-stationary (RS), is presented that minimizes data movement energy consumption on a spatial architecture and can adapt to different CNN shape configurations and reduces all types of data movement through maximally utilizing the processing engine local storage, direct inter-PE communication and spatial parallelism.