C-SAW: A Framework for Graph Sampling and Random Walk on GPUs
@article{Pandey2020CSAWAF, title={C-SAW: A Framework for Graph Sampling and Random Walk on GPUs}, author={Santosh Pandey and Lingda Li and Adolfy Hoisie and Xiaoye S. Li and Hang Liu}, journal={SC20: International Conference for High Performance Computing, Networking, Storage and Analysis}, year={2020}, pages={1-15} }
Many applications require to learn, mine, analyze and visualize large-scale graphs. These graphs are often too large to be addressed efficiently using conventional graph processing technologies. Fortunately, recent research efforts find out graph sampling and random walk, which significantly reduce the size of original graphs, can benefit the tasks of learning, mining, analyzing and visualizing large graphs by capturing the desirable graph properties. This paper introduces C-SAW, the first…
Figures and Tables from this paper
21 Citations
Efficient Data Loader for Fast Sampling-Based GNN Training on Large Graphs
- Computer ScienceIEEE Transactions on Parallel and Distributed Systems
- 2021
PaGraph is proposed, a novel, efficient data loader that supports general and efficient sampling-based GNN training on single-server with multi-GPU and embodies a lightweight yet effective caching policy that takes into account graph structural information and data access patterns of sampling- based GNNTraining simultaneously.
Trust: Triangle Counting Reloaded on GPUs
- Computer ScienceIEEE Transactions on Parallel and Distributed Systems
- 2021
Trust is the first work that achieves over one trillion Traversed Edges Per Second (TEPS) rate for triangle counting, and advocates that hashing can help the key operations for scalable triangle counting on Graphics Processing Units (GPUs).
gIM: GPU Accelerated RIS-Based Influence Maximization Algorithm
- Computer ScienceIEEE Transactions on Parallel and Distributed Systems
- 2021
This article presents a novel and efficient parallel implementation of a RIS-based algorithm, namely IMM, on GPU, which can significantly reduce the running time on large-scale graphs with low values of <inline-formula><tex-math notation="LaTeX">$\epsilon$</tex-Math><alternatives><mml:mi>.
ThunderRW: An In-Memory Graph Random Walk Engine
- Computer ScienceProc. VLDB Endow.
- 2021
Experimental results show that ThunderRW outperforms state-of-the-art approaches by an order of magnitude, and the step interleaving technique significantly reduces the CPU pipeline stall from 73.1% to 15.0%.
DSP: Efficient GNN Training with Multiple GPUs
- Computer SciencePPoPP
- 2023
This work proposes a system dubbed Distributed Sampling and Pipelining (DSP) for multi-GPU GNN training, which adopts a tailored data layout to utilize the fast NVLink connections among the GPUs, which stores the graph topology and popular node features in GPU memory.
Mining User-aware Multi-relations for Fake News Detection in Large Scale Online Social Networks
- Computer ScienceWSDM
- 2023
A dual-layer graph is constructed to extract multi-relations of news and users in social networks to derive rich information for detecting fake news and the superiority of Us-DeFake which outperforms all baselines is illustrated.
Scalable Deep Learning-Based Microarchitecture Simulation on GPUs
- Computer ScienceSC22: International Conference for High Performance Computing, Networking, Storage and Analysis
- 2022
This work proposes the first graphics processing unit (GPU)-based microarchitecture simulator that fully unleashes the power of GPUs to accelerate state-of-the-art ML-based simulators, and proposes a parallel simulation paradigm that partitions the application trace into sub-traces to simulate them in parallel with rigorous error analysis and effective error correction mechanisms.
T-GCN: A Sampling Based Streaming Graph Neural Network System with Hybrid Architecture
- Computer SciencePACT
- 2022
T-GCN is proposed, the first sampling-based streaming GNN system, which targets temporal-aware streaming graphs and takes advantage of a hybrid CPU-GPU co-processing architecture to achieve high throughput and low latency and an NVLink-specific task schedule to fully exploit NVLink's fast speed and improve GPU-GPU communication efficiency.
Distributed Graph Neural Network Training: A Survey
- Computer ScienceArXiv
- 2022
A new taxonomy for the optimization techniques in distributed GNN training that address the above challenges, and classifies existing techniques into four categories that are GNN data partition, GNN batch generation, Gnn execution model, and GNN communication protocol.
Scalable Graph Sampling on GPUs with Compressed Graph
- Computer ScienceCIKM
- 2022
A Chunk-wise Graph Compression format (CGC) is introduced to effectively reduce the graph size and save the graph transfer cost and a scalable GPU-based graph sampling framework GraSS is developed and evaluated to demonstrate the efficiency and scalability of GraSS on both real-world and synthetic graphs.
References
SHOWING 1-10 OF 86 REFERENCES
Traversing large graphs on GPUs with unified memory
- Computer ScienceProc. VLDB Endow.
- 2020
A lightweight offline graph reordering algorithm, HALO (Harmonic Locality Ordering), is proposed that can be used as a pre-processing step for static graphs and specifically aims to cover large directed real world graphs in addition to undirected graphs whereas prior methods only account for the latter.
Enterprise: breadth-first graph traversal on GPUs
- Computer ScienceSC15: International Conference for High Performance Computing, Networking, Storage and Analysis
- 2015
Enterprise is presented, a new GPU-based BFS system that combines three techniques to remove potential performance bottlenecks and is optimized for both top-down and bottom-up BFS.
Parallel edge-based sampling for static and dynamic graphs
- Computer ScienceCF
- 2019
A lightweight concurrent hash table coupled with a space-efficient dynamic graph data structure to overcome the challenges and memory constraints of sampling streaming dynamic graphs.
Gunrock: a high-performance graph processing library on the GPU
- Computer SciencePPoPP
- 2015
"Gunrock," the high-level bulk-synchronous graph-processing system targeting the GPU, takes a new approach to abstracting GPU graph analytics: rather than designing an abstraction around computation, Gunrock implements a novel data-centric abstraction centered on operations on a vertex or edge frontier.
Graphie: Large-Scale Asynchronous Graph Traversals on Just a GPU
- Computer Science2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)
- 2017
Graphie, a system to efficiently traverse large-scale graphs on a single GPU that stores the vertex attribute data in the GPU memory and streams edge data asynchronously to the GPU for processing, and relies on two renaming algorithms for high performance.
SIMD-X: Programming and Processing of Graph Algorithms on GPUs
- Computer ScienceUSENIX Annual Technical Conference
- 2019
SIMD-X utilizes just-in-time task management which filters out inactive vertices at runtime and intelligently maps various tasks to different amount of GPU cores in pursuit of workload balancing, and leverages push-pull based kernel fusion that reduces a large number of computation kernels to very few.
Power-efficient and highly scalable parallel graph sampling using FPGAs
- Computer Science2017 International Conference on ReConFigurable Computing and FPGAs (ReConFig)
- 2017
This paper designs and implementation of an FPGA based graph sampling method which allows time- and memory-efficient representation of graphs suitable for reconfigurable hardware such as FPGAs and shows that the proposed techniques are 2x faster and 3x more energy efficient as compared to serial CPU version of the algorithm.
GraphMat: High performance graph analytics made productive
- Computer ScienceProc. VLDB Endow.
- 2015
GraphMat is a single-node multicore graph framework written in C++ that achieves better multicore scalability than other frameworks and is 1.2X off native, hand-optimized code on a variety of graph algorithms.
Fast Random Walk with Restart and Its Applications
- Computer ScienceSixth International Conference on Data Mining (ICDM'06)
- 2006
The heart of the approach is to exploit two important properties shared by many real graphs: linear correlations and block- wise, community-like structure and exploit the linearity by using low-rank matrix approximation, and the community structure by graph partitioning, followed by the Sherman- Morrison lemma for matrix inversion.
Accurate, Efficient and Scalable Graph Embedding
- Computer Science2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
- 2019
This paper proposes novel parallelization techniques for graph sampling-based GCNs that achieve superior scalable performance on very large graphs without compromising accuracy, and demonstrates that the parallel graph embedding outperforms state-of theart methods in scalability, efficiency and accuracy on several large datasets.