A Distributed Multi-GPU System for Fast Graph Processing
@article{Jia2017ADM, title={A Distributed Multi-GPU System for Fast Graph Processing}, author={Zhihao Jia and Yongkee Kwon and Galen M. Shipman and Patrick S. McCormick and Mattan Erez and Alexander Aiken}, journal={Proc. VLDB Endow.}, year={2017}, volume={11}, pages={297-310} }
We present Lux, a distributed multi-GPU system that achieves fast graph processing by exploiting the aggregate memory bandwidth of multiple GPUs and taking advantage of locality in the memory hierarchy of multi-GPU clusters. Lux provides two execution models that optimize algorithmic efficiency and enable important GPU optimizations, respectively. Lux also uses a novel dynamic load balancing strategy that is cheap and achieves good load balance across GPUs. In addition, we present a performance…
Figures and Tables from this paper
50 Citations
MG-Join: A Scalable Join for Massively Parallel Multi-GPU Architectures
- Computer ScienceSIGMOD Conference
- 2021
This paper proposes MG-Join, a scalable partitioned hash join implementation on multiple GPUs of a single machine that outperforms the state-of-the-art hash join implementations by up to 2.5x and helps improve the overall performance of TPC-H queries byUp to 4.5X over multi-GPU version of an open-source commercial GPU database Omnisci.
A Study of Graph Analytics for Massive Datasets on Distributed Multi-GPUs
- Computer Science2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
- 2020
This paper presents the first detailed analysis of graph analytics applications for massive real-world datasets on a distributed multi-GPU platform and the first analysis of strong scaling of smaller real- world datasets.
SIMD-X: Programming and Processing of Graph Algorithms on GPUs
- Computer ScienceUSENIX Annual Technical Conference
- 2019
SIMD-X utilizes just-in-time task management which filters out inactive vertices at runtime and intelligently maps various tasks to different amount of GPU cores in pursuit of workload balancing, and leverages push-pull based kernel fusion that reduces a large number of computation kernels to very few.
GPU-based Graph Traversal on Compressed Graphs
- Computer ScienceSIGMOD Conference
- 2019
This paper introduces GPU-based graph traversal on compressed graphs, designed towards GPU's SIMT architecture, and proposes two novel parallel scheduling strategies Two-Phase Traversal and Task-Stealing to handle thread divergence and workload imbalance issues when decoding the compressed graph.
Self-adaptive Graph Traversal on GPUs
- Computer ScienceSIGMOD Conference
- 2021
This paper introduces SAGE, a self- Adaptive graph traversal on GPUs, which is free from preprocessing and operates on ubiquitous graph representations directly, and proposes Tiled Partitioning and Resident Tile Stealing to fully exploit the computing power of GPUs in a runtime and self-adaptive manner.
AsynGraph: Maximizing Data Parallelism for Efficient Iterative Graph Processing on GPUs
- Computer ScienceACM Trans. Archit. Code Optim.
- 2020
This article develops a novel system, called AsynGraph, to maximize its data parallelism, which enables the state propagations of most vertices to be effectively conducted on the GPUs in a concurrent way to get a higher GPU utilization ratio through efficiently handling the paths between the important vertices.
Subway: minimizing data transfer during out-of-GPU-memory graph processing
- Computer ScienceEuroSys
- 2020
This work designs a fast subgraph generation algorithm with a simple yet efficient subgraph representation and a GPU-accelerated implementation, and brings asynchrony to the subgraph processing, delaying the synchronization between a subgraph in the GPU memory and the rest of the graph in the CPU memory.
DiGraph: An Efficient Path-based Iterative Directed Graph Processing System on Multiple GPUs
- Computer ScienceASPLOS
- 2019
A novel and efficient iterative directed graph processing system on a machine with the support of multiple GPUs that takes advantage of the dependencies between vertices in three novel ways to help efficient vertex state propagation along the paths over GPUs for faster convergence speed and higher utilization ratio of the loaded data.
Excavating the Potential of GPU for Accelerating Graph Traversal
- Computer Science2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
- 2019
EtaGraph is a novel GPU graph traversal framework optimized for GPU memory system and execution parallelism that uses a frontier-like kernel execution model, featuring a lightweight graph transformation procedure, named Unified Degree Cut, to process skewed graph efficiently without modification of raw data or introducing extra space overhead.
An Adaptive Load Balancer For Graph Analytical Applications on GPUs
- Computer ScienceArXiv
- 2019
This scheme is implemented in the IrGL compiler to allow users to generate efficient load balanced code for a GPU from high-level sequential programs and can achieve an average speed-up of 2.2x on inputs that suffer from severe load imbalance problems when previous state-of-the-art load-balancing schemes are used.
References
SHOWING 1-10 OF 38 REFERENCES
GTS: A Fast and Scalable Graph Processing Method based on Streaming Topology to GPUs
- Computer ScienceSIGMOD Conference
- 2016
A fast and scalable graph processing method GTS is proposed that handles even RMAT32 (64 billion edges) very efficiently only by using a single machine and consistently and significantly outperforms the major distributed graph processing methods, GraphX, Giraph, and PowerGraph, and the state-of-the-art GPU-based method TOTEM.
CuSha: vertex-centric graph processing on GPUs
- Computer ScienceHPDC '14
- 2014
CuSha is a CUDA-based graph processing framework that overcomes the above obstacle via use of two novel graph representations: G-Shards and Concatenated Windows.
Scalable GPU graph traversal
- Computer SciencePPoPP '12
- 2012
This work presents a BFS parallelization focused on fine-grained task management constructed from efficient prefix sum that achieves an asymptotically optimal O(|V|+|E|) work complexity.
MapGraph: A High Level API for Fast Development of High Performance Graph Analytics on GPUs
- Computer ScienceGRADES
- 2014
MapGraph is presented, a high performance parallel graph programming framework that delivers up to 3 billion Traversed Edges Per Second on a GPU and is comparable to state-of-the-art, manually optimized GPU implementations.
Gunrock: a high-performance graph processing library on the GPU
- Computer ScienceArXiv
- 2015
"Gunrock," the high-level bulk-synchronous graph-processing system targeting the GPU, takes a new approach to abstracting GPU graph analytics: rather than designing an abstraction around computation, Gunrock implements a novel data-centric abstraction centered on operations on a vertex or edge frontier.
Medusa: Simplified Graph Processing on GPUs
- Computer ScienceIEEE Transactions on Parallel and Distributed Systems
- 2014
This work proposes a programming framework called Medusa which enables developers to leverage the capabilities of GPUs by writing sequential C/C++ code and develops a series of graph-centric optimizations based on the architecture features of GPUs for efficiency.
PGX.D: a fast distributed graph processing engine
- Computer ScienceSC15: International Conference for High Performance Computing, Networking, Storage and Analysis
- 2015
This paper presents a fast distributed graph processing system, namely PGX.D, as a low-overhead, bandwidth-efficient communication framework that supports remote data-pulling patterns and recommends the use of balanced beefy clusters where the sustained random DRAM-access bandwidth in aggregate is matched with the bandwidth of the underlying interconnection fabric.
Groute: An Asynchronous Multi-GPU Programming Model for Irregular Computations
- Computer SciencePPoPP 2017
- 2017
It is demonstrated that this approach achieves state-of-the-art performance and exhibits strong scaling for a suite of irregular applications on 8-GPU and heterogeneous systems, yielding over 7x speedup for some algorithms.
MOCgraph: Scalable Distributed Graph Processing Using Message Online Computing
- Computer ScienceProc. VLDB Endow.
- 2014
This paper proposes MOCgraph, a scalable distributed graph processing framework to reduce the memory footprint and improve the scalability, based on message online computing, and implements it on top of Apache Giraph, and tests it against several representative graph algorithms.
Graph Analytics Through Fine-Grained Parallelism
- Computer ScienceSIGMOD Conference
- 2016
The topological properties of the underlying graph are explored to design and implement a highly effective concurrency control scheme for efficient synchronous processing in an in-memory graph analytical engine and the results show that the proposed hybrid synchronous scheduler has significantly outperformed other synchronous Scheduler in existing graph analytical engines, as well as BSP and asynchronous schedulers.