Corpus ID: 12525494

GraphX: Unifying Data-Parallel and Graph-Parallel Analytics

@article{Xin2014GraphXUD,
  title={GraphX: Unifying Data-Parallel and Graph-Parallel Analytics},
  author={Reynold Xin and Daniel Crankshaw and Ankur Dave and Joseph E. Gonzalez and Michael J. Franklin and Ion Stoica},
  journal={ArXiv},
  year={2014},
  volume={abs/1402.2394}
}
From social networks to language modeling, the growing scale and importance of graph data has driven the development of numerous new graph-parallel systems (e.g., Pregel, GraphLab). By restricting the computation that can be expressed and introducing new techniques to partition and distribute the graph, these systems can efficiently execute iterative graph algorithms orders of magnitude faster than more general data-parallel systems. However, the same restrictions that enable the performance… Expand
Systems for Big-Graphs
TLDR
This tutorial discusses the design of the emerging systems for processing of big-graphs, key features of distributed graph algorithms, as well as graph partitioning and workload balancing techniques, and highlights the current challenges and some future research directions. Expand
The Taxonomy of Distributed Graph Analytics
TLDR
This paper aims to provide the taxonomy of various distributed programming models, distributed graph processing frameworks and various kinds of graph analytics that are essential for the analysis of large-scale networks. Expand
A communication-reduced and computation-balanced framework for fast graph computation
TLDR
Evaluation of LCC-Graph on a 32-node cluster, driven by real-world graph datasets, shows that it significantly outperforms existing distributed graph-processing frameworks in terms of runtime, particularly when the system is supported by a high-bandwidth network. Expand
GraphU: A Unified Vertex-Centric Parallel Graph Processing Platform
TLDR
This work proposes a framework of complexity analysis for DFA-G automaton and shows that it can significantly facilitate complexity analysis on asynchronous programs, and develops a new prototype platform, GraphU, which entirely removes synchronization barriers and decouples remote communication from vertex computation. Expand
On Improving Distributed Pregel-like Graph Processing Systems
The considerable interest in distributed systems that can execute algorithms to process large graphs has led to the creation of many graph processing systems. However, existing systems suffer fromExpand
A Comparative Evaluation of Big Data Frameworks for Graph Processing
  • Marc Kaepke, O. Zukunft
  • Computer Science
  • 2018 4th International Conference on Big Data Innovations and Applications (Innovate-Data)
  • 2018
TLDR
This paper focuses on the scalability of GraphX and Gelly with respect to increasing data volumes and their ability to distribute work between multiple processing nodes in a cluster and shows that choosing between different computing models offered by the frameworks can significantly influence the performance of big data graph computations. Expand
Distributed graph cube generation using Spark framework
TLDR
The GraphNaïve and GraphTDC algorithms are proposed, which sequentially computes graph cuboids for all dimensions in a graph, while the Generate Multi-Dimension Table method is proposed to efficiently create a multidimensional graph table to express the graph. Expand
LCC-Graph: A high-performance graph-processing framework with low communication costs
TLDR
Evaluation of LCC-Graph on a 32-node cluster, driven by real-world graph datasets, shows that it significantly outperforms existing distributed graph-processing frameworks in terms of runtime, particularly when the system is supported by a high-bandwidth network. Expand
Management and Analysis of Big Graph Data: Current Systems and Open Challenges
TLDR
This chapter surveys current system approaches for management and analysis of “big graph data”, and outlines a recent research framework called Gradoop that is build on the so-called Extended Property Graph Data Model with dedicated support for analyzing not only single graphs but also collections of graphs. Expand
VENUS: Vertex-centric streamlined graph computation on a single PC
TLDR
VENUS is a disk-based graph computation system which is able to handle billion-scale problems efficiently on a commodity PC and adopts a novel computing architecture that features vertex-centric “streamlined” processing. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 26 REFERENCES
PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs
TLDR
This paper describes the challenges of computation on natural graphs in the context of existing graph-parallel abstractions and introduces the PowerGraph abstraction which exploits the internal structure of graph programs to address these challenges. Expand
GraphBuilder: scalable graph ETL framework
TLDR
The motivation for GraphBuilder, its architecture, MapReduce algorithms, and performance evaluation of the framework are described, and several graph partitioning methods are developed and evaluated. Expand
Pregel: a system for large-scale graph processing
TLDR
A model for processing large graphs that has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier. Expand
Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud
While high-level data parallel frameworks, like MapReduce, simplify the design and implementation of large-scale data processing systems, they do not naturally or efficiently support many importantExpand
Signal/Collect: Graph Algorithms for the (Semantic) Web
TLDR
This paper presents the Signal/Collect programming model for synchronous and asynchronous graph algorithms and demonstrates that this abstraction can capture the essence of many algorithms on graphs in a concise and elegant way by giving Signal/ collect adaptations of various relevant algorithms. Expand
X-Stream: edge-centric graph processing using streaming partitions
TLDR
X-Stream is novel in using an edge-centric rather than a vertex-centric implementation of this model, and streaming completely unordered edge lists rather than performing random access, and competes favorably with existing systems for graph processing. Expand
The Combinatorial BLAS: design, implementation, and applications
TLDR
The parallel Combinatorial BLAS is described, which consists of a small but powerful set of linear algebra primitives specifically targeting graph and data mining applications, and an extensible library interface and some guiding principles for future development are provided. Expand
Spinning Fast Iterative Data Flows
TLDR
This work proposes a method to integrate incremental iterations, a form of workset iterations, with parallel dataflows and presents an extension to the programming model for incremental iterations that alleviates for the lack of mutable state in dataflow and allows for exploiting the sparse computational dependencies inherent in many iterative algorithms. Expand
Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks
TLDR
Experiments performed show that combining the order produced by the proposed algorithm with the WebGraph compression framework provides a major increase in compression with respect to all currently known techniques, both on web graphs and on social networks. Expand
Naiad: a timely dataflow system
TLDR
It is shown that many powerful high-level programming models can be built on Naiad's low-level primitives, enabling such diverse tasks as streaming data analysis, iterative machine learning, and interactive graph mining. Expand
...
1
2
3
...