Streaming Graph Partitioning: An Experimental Study

@article{Abbas2018StreamingGP,
  title={Streaming Graph Partitioning: An Experimental Study},
  author={Zainab Abbas and Vasiliki Kalavri and Paris Carbone and Vladimir Vlassov},
  journal={Proc. VLDB Endow.},
  year={2018},
  volume={11},
  pages={1590-1603}
}
Graph partitioning is an essential yet challenging task for massive graph analysis in distributed computing. Common graph partitioning methods scan the complete graph to obtain structural characteristics offline, before partitioning. However, the emerging need for low-latency, continuous graph analysis led to the development of online partitioning methods. Online methods ingest edges or vertices as a stream, making partitioning decisions on the fly based on partial knowledge of the graph. Prior… 

Experimental Analysis of Streaming Algorithms for Graph Partitioning

TLDR
The results show that the no partitioning algorithms performs best in all cases, and the choice of graph partitioning algorithm depends on: type and degree distribution of the graph, characteristics of the workloads, and specific application requirements.

Clustering-based Partitioning for Large Web Graphs

TLDR
This work explores the property of web graph clustering and proposes a novel restreaming algorithm for vertex-cut partitioning and finds that the runtime cost of this method can be an order of magnitude lower than that of one-pass streaming partitioning algorithms, when the number of partitions is large.

WSGP: A Window-based Streaming Graph Partitioning Approach

TLDR
A novel window-based streaming graph partitioning algorithm (WSGP), which leverages a greedy-based heuristic to perform edge partitioning and consistently has a smaller replication factor than the state-of-the-art algorithms by up to 23%, at a limited cost in terms of memory and comprehensive running time.

A Study of Partitioning Policies for Graph Analytics on Large-scale Distributed Platforms

TLDR
An experimental study of partitioning strategies for work-efficient graph analytics applications on large KNL and Skylake clusters with up to 256 machines using the Gluon communication runtime which implements partitioning-specific communication optimizations.

Time-Efficient and High-Quality Graph Partitioning for Graph Dynamic Scaling

TLDR
The evaluation with the real-world billion-scale graphs demonstrates that the proposed approach significantly reduces the repartitioning time, while the partitioning quality it achieves is on par with that of the best existing static method.

Machine Learning-based Selection of Graph Partitioning Strategy Using the Characteristics of Graph Data and Algorithm

TLDR
This work proposes a machine learning-based approach to select the most appropriate partitioning strategy for a given graph and processing algorithm, which enumerates viable partitioning strategies, predicts the execution time of the target algorithm for each, and selects the partitioning Strategy with the fastest estimated execution time.

Out-of-Core Edge Partitioning at Linear Run-Time

TLDR
2PS-L is proposed, a novel out-of-core edge partitioning algorithm that builds upon the stateful streaming model, but achieves linear run-time i.e.,O(|E|)).

Recursive Multi-Section on the Fly: Shared-Memory Streaming Algorithms for Hierarchical Graph Partitioning and Process Mapping

TLDR
This work presents a shared-memory streaming multi-recursive partitioning scheme that performs recursive multi-sections on the fly without knowing the overall input graph, and has a considerably lower running time complexity in comparison with state-of-the-art non-buffered one-pass partitioning algorithms.

OffStreamNG: Partial Stream Hybrid Graph Edge Partitioning Based on Neighborhood Expansion and Greedy Heuristic

TLDR
This study proposes partial stream hybrid graph edge partitioning OffStreamNG, which leverages the advantage of both offline and stream edge partitions approaches by interconnecting via saved partition state layer.
...

References

SHOWING 1-10 OF 50 REFERENCES

Streaming graph partitioning for large distributed graphs

TLDR
This work proposes natural, simple heuristics for graph partitioning and compares their performance to hashing and METIS, a fast, offline heuristic, and shows on a large collection of graph datasets that they are a significant improvement.

FENNEL: streaming graph partitioning for massive scale graphs

TLDR
This work derives a novel one-pass, streaming graph partitioning algorithm and shows that it yields significant performance improvements over previous approaches using an extensive set of real-world and synthetic graphs.

Streaming Graph Partitioning in the Planted Partition Model

TLDR
This work contributes to the recent research line of streaming graph partitioning which computes an approximately balanced k-partitioning of the vertex set of a graph using a single pass over the graph stream using degree-based criteria.

An Experimental Comparison of Partitioning Strategies in Distributed Graph Processing

TLDR
This paper evaluates and characterize both the performance and resource usage of different partitioning strategies under various popular distributed graph processing systems, applications, input graphs, and execution environments, and presents rules of thumb to help users pick the best partitioning strategy for their particular use cases.

GraphX: Graph Processing in a Distributed Dataflow Framework

TLDR
This paper introduces GraphX, an embedded graph processing framework built on top of Apache Spark, a widely used distributed dataflow system and demonstrates that GraphX achieves an order of magnitude performance gain over the base dataflow framework and matches the performance of specialized graph processing systems while enabling a wider range of computation.

GraphBuilder: scalable graph ETL framework

TLDR
The motivation for GraphBuilder, its architecture, MapReduce algorithms, and performance evaluation of the framework are described, and several graph partitioning methods are developed and evaluated.

Balanced graph edge partition

TLDR
This paper describes the expected costs of vertex and edge partitions with and without aggregation of messages, and obtains the first approximation algorithms for the balanced edge-partition problem, which for the case of no aggregation matches the best known approximation ratio.

Distributed Power-law Graph Computing: Theoretical and Empirical Analysis

TLDR
A novel vertex-cut method, called degree-based hashing (DBH), is proposed, which makes effective use of the skewed degree distributions for GP and theoretically proves that DBH can achieve lower communication cost than existing methods and can simultaneously guarantee good workload balance.

One Trillion Edges: Graph Processing at Facebook-Scale

TLDR
The usability, performance, and scalability improvements made to Apache Giraph are described and several key extensions to the original Pregel model are described that make it possible to develop a broader range of production graph applications and workflows as well as improve code reuse.