• Corpus ID: 247291745

Understanding the Design-Space of Sparse/Dense Multiphase GNN dataflows on Spatial Accelerators

@inproceedings{Garg2021UnderstandingTD,
  title={Understanding the Design-Space of Sparse/Dense Multiphase GNN dataflows on Spatial Accelerators},
  author={Raveesh Garg and Eric Qin and Francisco Munoz-Mart'inez and Robert Guirado and Akshay Jain and S. Abadal and Jos'e L. Abell'an and Manuel E. Acacio and Eduard Alarc'on and Sivasankaran Rajamanickam and Tushar Krishna},
  year={2021}
}
Graph Neural Networks (GNNs) have garnered a lot of recent interest because of their success in learning representations from graph-structured data across several critical applications in cloud and HPC. Owing to their unique compute and memory characteristics that come from an interplay between dense and sparse phases of computations, the emergence of reconfigurable dataflow (aka spatial) accelerators offers promise for acceleration by mapping optimized dataflows (i.e., computation order and… 

References

SHOWING 1-10 OF 36 REFERENCES
HyGCN: A GCN Accelerator with Hybrid Architecture
  • Mingyu Yan, Lei Deng, Yuan Xie
  • Computer Science
    2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)
  • 2020
TLDR
This work describes the hybrid execution patterns of GCNs on Intel Xeon CPU and proposes a hardware design with two efficient processing engines to alleviate the irregularity of Aggregation phase and leverage the regularity of Combination phase, and designs a GCN accelerator using a hybrid architecture to efficiently perform GCNs.
SARA: Scaling a Reconfigurable Dataflow Accelerator
TLDR
SARA introduces compiler-managed memory consistency (CMMC), a control paradigm that hierarchically pipelines a nested and data-dependent control-flow graph onto a dataflow architecture, and a compilation flow that decomposes the program graph across distributed heterogeneous resources to hide low-level RDA constraints from programmers.
Fast Stencil-Code Computation on a Wafer-Scale Processor
TLDR
The solution of large, sparse, and often structured systems of linear equations must be solved on the Cerebras Systems CS-1, a wafer-scale processor that has the memory bandwidth and communication latency to perform well.
AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing
  • Tong Geng, Ang Li, M. Herbordt
  • Computer Science
    2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
  • 2020
TLDR
This paper proposes Autotuning-Workload-Balancing GCN (AWB-GCN) to accelerate GCN inference and proposes three hardware-based autotuning techniques: dynamic distribution smoothing, remote switching, and row remapping to address the issue of workload imbalance in processing real-world graphs.
EnGN: A High-Throughput and Energy-Efficient Accelerator for Large Graph Neural Networks
TLDR
The proposed EnGN is designed to accelerate the three key stages of GNN propagation, which is abstracted as common computing patterns shared by typical GNNs, and proposes the ring-edge-reduce(RER) dataflow that tames the poor locality of sparsely-and-randomly connected vertices, and the RER PE-array to practice RER dataflow.
Deep Learning Recommendation Model for Personalization and Recommendation Systems
TLDR
A state-of-the-art deep learning recommendation model (DLRM) is developed and its implementation in both PyTorch and Caffe2 frameworks is provided and a specialized parallelization scheme utilizing model parallelism on the embedding tables to mitigate memory constraints while exploiting data parallelism to scale-out compute from the fully-connected layers is designed.
Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators
We show that DNN accelerator micro-architectures and their program mappings represent specific choices of loop order and hardware parallelism for computing the seven nested loops of DNNs, which
MAERI : Enabling Flexible Dataflow Mapping over DNN Accelerators via Programmable Interconnects
TLDR
The design of MAERI, enabling Flexible Dataflow Mapping over DNN Accelerators via Programmable Interconnects, connects these switches via a new configurable and non-blocking tree topology to provide not only programmability but also high throughput.
Plasticine: A reconfigurable architecture for parallel patterns
TLDR
This work designs Plasticine, a new spatially reconfigurable architecture designed to efficiently execute applications composed of parallel patterns that provide an improvement of up to 76.9× in performance-per-Watt over a conventional FPGA over a wide range of dense and sparse applications.
Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks
TLDR
A novel dataflow, called row-stationary (RS), is presented, that minimizes data movement energy consumption on a spatial architecture and can adapt to different CNN shape configurations and reduces all types of data movement through maximally utilizing the processing engine local storage, direct inter-PE communication and spatial parallelism.
...
...