# C-SAW: A Framework for Graph Sampling and Random Walk on GPUs

@article{Pandey2020CSAWAF,
title={C-SAW: A Framework for Graph Sampling and Random Walk on GPUs},
author={Santosh Pandey and Lingda Li and Adolfy Hoisie and Xiaoye S. Li and Hang Liu},
journal={SC20: International Conference for High Performance Computing, Networking, Storage and Analysis},
year={2020},
pages={1-15}
}
• Published 18 September 2020
• Computer Science
• SC20: International Conference for High Performance Computing, Networking, Storage and Analysis
Many applications require to learn, mine, analyze and visualize large-scale graphs. These graphs are often too large to be addressed efficiently using conventional graph processing technologies. Fortunately, recent research efforts find out graph sampling and random walk, which significantly reduce the size of original graphs, can benefit the tasks of learning, mining, analyzing and visualizing large graphs by capturing the desirable graph properties. This paper introduces C-SAW, the first…

## Figures and Tables from this paper

• Computer Science
IEEE Transactions on Parallel and Distributed Systems
• 2021
PaGraph is proposed, a novel, efficient data loader that supports general and efficient sampling-based GNN training on single-server with multi-GPU and embodies a lightweight yet effective caching policy that takes into account graph structural information and data access patterns of sampling- based GNNTraining simultaneously.
• Computer Science
IEEE Transactions on Parallel and Distributed Systems
• 2021
Trust is the first work that achieves over one trillion Traversed Edges Per Second (TEPS) rate for triangle counting, and advocates that hashing can help the key operations for scalable triangle counting on Graphics Processing Units (GPUs).
• Computer Science
IEEE Transactions on Parallel and Distributed Systems
• 2021
This article presents a novel and efficient parallel implementation of a RIS-based algorithm, namely IMM, on GPU, which can significantly reduce the running time on large-scale graphs with low values of <inline-formula><tex-math notation="LaTeX">$\epsilon$</tex-Math><alternatives><mml:mi>.
• Computer Science
Proc. VLDB Endow.
• 2021
Experimental results show that ThunderRW outperforms state-of-the-art approaches by an order of magnitude, and the step interleaving technique significantly reduces the CPU pipeline stall from 73.1% to 15.0%.
• Computer Science
PPoPP
• 2023
This work proposes a system dubbed Distributed Sampling and Pipelining (DSP) for multi-GPU GNN training, which adopts a tailored data layout to utilize the fast NVLink connections among the GPUs, which stores the graph topology and popular node features in GPU memory.
• Computer Science
WSDM
• 2023
A dual-layer graph is constructed to extract multi-relations of news and users in social networks to derive rich information for detecting fake news and the superiority of Us-DeFake which outperforms all baselines is illustrated.
• Computer Science
SC22: International Conference for High Performance Computing, Networking, Storage and Analysis
• 2022
This work proposes the first graphics processing unit (GPU)-based microarchitecture simulator that fully unleashes the power of GPUs to accelerate state-of-the-art ML-based simulators, and proposes a parallel simulation paradigm that partitions the application trace into sub-traces to simulate them in parallel with rigorous error analysis and effective error correction mechanisms.
• Computer Science
PACT
• 2022
T-GCN is proposed, the first sampling-based streaming GNN system, which targets temporal-aware streaming graphs and takes advantage of a hybrid CPU-GPU co-processing architecture to achieve high throughput and low latency and an NVLink-specific task schedule to fully exploit NVLink's fast speed and improve GPU-GPU communication efficiency.
A new taxonomy for the optimization techniques in distributed GNN training that address the above challenges, and classifies existing techniques into four categories that are GNN data partition, GNN batch generation, Gnn execution model, and GNN communication protocol.
• Computer Science
CIKM
• 2022
A Chunk-wise Graph Compression format (CGC) is introduced to effectively reduce the graph size and save the graph transfer cost and a scalable GPU-based graph sampling framework GraSS is developed and evaluated to demonstrate the efficiency and scalability of GraSS on both real-world and synthetic graphs.

## References

SHOWING 1-10 OF 86 REFERENCES

• Computer Science
Proc. VLDB Endow.
• 2020
A lightweight offline graph reordering algorithm, HALO (Harmonic Locality Ordering), is proposed that can be used as a pre-processing step for static graphs and specifically aims to cover large directed real world graphs in addition to undirected graphs whereas prior methods only account for the latter.
• Computer Science
SC15: International Conference for High Performance Computing, Networking, Storage and Analysis
• 2015
Enterprise is presented, a new GPU-based BFS system that combines three techniques to remove potential performance bottlenecks and is optimized for both top-down and bottom-up BFS.
• Computer Science
CF
• 2019
A lightweight concurrent hash table coupled with a space-efficient dynamic graph data structure to overcome the challenges and memory constraints of sampling streaming dynamic graphs.
• Computer Science
PPoPP
• 2015
"Gunrock," the high-level bulk-synchronous graph-processing system targeting the GPU, takes a new approach to abstracting GPU graph analytics: rather than designing an abstraction around computation, Gunrock implements a novel data-centric abstraction centered on operations on a vertex or edge frontier.
• Computer Science
2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)
• 2017
Graphie, a system to efficiently traverse large-scale graphs on a single GPU that stores the vertex attribute data in the GPU memory and streams edge data asynchronously to the GPU for processing, and relies on two renaming algorithms for high performance.
• Computer Science
USENIX Annual Technical Conference
• 2019
SIMD-X utilizes just-in-time task management which filters out inactive vertices at runtime and intelligently maps various tasks to different amount of GPU cores in pursuit of workload balancing, and leverages push-pull based kernel fusion that reduces a large number of computation kernels to very few.
• Computer Science
2017 International Conference on ReConFigurable Computing and FPGAs (ReConFig)
• 2017
This paper designs and implementation of an FPGA based graph sampling method which allows time- and memory-efficient representation of graphs suitable for reconfigurable hardware such as FPGAs and shows that the proposed techniques are 2x faster and 3x more energy efficient as compared to serial CPU version of the algorithm.
• Computer Science
Proc. VLDB Endow.
• 2015
GraphMat is a single-node multicore graph framework written in C++ that achieves better multicore scalability than other frameworks and is 1.2X off native, hand-optimized code on a variety of graph algorithms.
• Computer Science
Sixth International Conference on Data Mining (ICDM'06)
• 2006
The heart of the approach is to exploit two important properties shared by many real graphs: linear correlations and block- wise, community-like structure and exploit the linearity by using low-rank matrix approximation, and the community structure by graph partitioning, followed by the Sherman- Morrison lemma for matrix inversion.
• Computer Science
2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
• 2019
This paper proposes novel parallelization techniques for graph sampling-based GCNs that achieve superior scalable performance on very large graphs without compromising accuracy, and demonstrates that the parallel graph embedding outperforms state-of theart methods in scalability, efficiency and accuracy on several large datasets.