Declarative and distributed graph analytics with GRADOOP

@article{Junghanns2018DeclarativeAD,
  title={Declarative and distributed graph analytics with GRADOOP},
  author={Martin Junghanns and Max Kie{\ss}ling and Niklas Teichmann and Kevin G{\'o}mez and Andr{\'e} Petermann and Erhard Rahm},
  journal={Proc. VLDB Endow.},
  year={2018},
  volume={11},
  pages={2006-2009}
}
We demonstrate G radoop , an open source framework that combines and extends features of graph database systems with the benefits of distributed graph processing. Using a rich graph data model and powerful graph operators, users can declaratively express graph analytical programs for distributed execution without needing advanced programming experience or a deeper understanding of the underlying system. Visitors of the demo can declare graph analytical programs using the G radoop… 

Figures and Tables from this paper

Graph Sampling with Distributed In-Memory Dataflow Systems
TLDR
This work focuses on the implementation of distributed graph sampling for Big Data frameworks and in-memory dataflow systems such as Apache Spark or Apache Flink and evaluates the scalability of the new implementations.
BIGGR: Bringing Gradoop to Applications
TLDR
The BIGGR approach is introduced, providing a novel tool for the user-friendly and efficient analysis and visualization of Big Graph Data on top of the open-source software KNIME and gradoop and the distributed processing framework Apache Flink.
Graph Data Transformations in Gradoop
TLDR
This work investigates transformation operations for property graphs managed by the distributed platform Gradoop to support ETL processes for graph data and provides initial results of a runtime evaluation of the proposed graph data transformations.
Exploration and Analysis of Temporal Property Graphs
We demonstrate the Temporal Graph Explorer, a distributed opensource framework that enables time-dependent graph exploration and analysis on large real-world networks using a rich temporal property
Analyzing Temporal Graphs with Gradoop
TLDR
This work extends the distributed graph analysis framework Gradoop for temporal graph analysis by adding time properties to vertices, edges and graphs and using them within graph operators, and outlines their use within analysis workflows.
Big graph analysis by visually created workflows
TLDR
An effort to improve the usability of the open-source system Gradoop for processing and analyzing large graphs is reported on by integrating Gradooper into the popular open- source software KNIME to visually create graph analysis workflows, without the need for coding.
An analysis of the graph processing landscape
TLDR
An overview of different aspects of the graph processing landscape is provided, different types of systems to use, coordination and communication models in distributed graph processing, partitioning techniques and different definitions related to the potential for a graph to be updated are described.
PatMat: A Distributed Pattern Matching Engine with Cypher
TLDR
This work leverages the state-of-the-art join-based algorithms in the distributed contexts and Cypher query language - the most widely-adopted declarative language for graph pattern matching to glue together the academic efforts on performance and the industrial efforts on expressiveness.
Evolution Analysis of Large Graphs with Gradoop
TLDR
This paper contains an overview of the distributed graph analysis framework Gradoop and its graph data model and an example use case from the financial domain demonstrating the flexibility of the temporal graph model and its operators.
Distributed graphs: in search of fast, low-latency, resource-efficient, semantics-rich Big-Data processing
TLDR
The resulting analysis identifies the need for systems to be able to effectively extend the type of read-eval loop execution, by maintaining a graph (or parts of) in a cluster's memory for reuse, skipping the recurring I/O overhead which is present in all systems.
...
...

References

SHOWING 1-10 OF 15 REFERENCES
Cypher-based Graph Pattern Matching in Gradoop
TLDR
This work implemented the declarative graph query language Cypher within the distributed graph analysis platform Gradoop, using LDBC graph data, and shows that the query engine is scalable for operational as well as analytical workloads.
Analyzing extended property graphs with Apache Flink
TLDR
The Extended Property Graph Model is proposed, which is semantically rich, schema-free and supports multiple distinct graphs and provides declarative and combinable operators to analyze both single graphs and graph collections.
Distributed Grouping of Property Graphs with Gradoop
TLDR
This paper presents an algorithm for graph grouping with support for attribute aggregation and structural summarization by user-deĄned vertex and edge properties and demonstrates the scalability of the algorithm on real-world and synthetic social network data.
Management and Analysis of Big Graph Data: Current Systems and Open Challenges
TLDR
This chapter surveys current system approaches for management and analysis of “big graph data”, and outlines a recent research framework called Gradoop that is build on the so-called Extended Property Graph Data Model with dedicated support for analyzing not only single graphs but also collections of graphs.
GraphX: a resilient distributed graph system on Spark
TLDR
GraphX is introduced, which combines the advantages of both data-parallel and graph-par parallel systems by efficiently expressing graph computation within the Spark data- parallel framework and provides powerful new operations to simplify graph construction and transformation.
A performance evaluation of open source graph databases
TLDR
A qualitative study and a performance comparison of 12 open source graph databases using four fundamental graph algorithms on networks containing up to 256 million edges are conducted.
Large scale graph processing systems: survey and an experimental evaluation
TLDR
A comprehensive survey over the state-of-the-art of large scale graph processing platforms, namely, GraphChi, Apache Giraph, GPS, GraphLab and GraphX, and an extensive experimental study of five popular systems in this domain.
Managing and mining large graphs: systems and implementations
TLDR
This tutorial highlights the challenges posed by the graph data, the constraints of architectural design, the different types of application needs, and the power of different programming models that support such needs.
Pregel: a system for large-scale graph processing
TLDR
A model for processing large graphs that has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier.
DIMSpan: Transactional Frequent Subgraph Mining with Distributed In-Memory Dataflow Systems
TLDR
DIMSpan is introduced, an advanced approach to frequent subgraph mining that utilizes the features provided by distributed in-memory dataflow systems such as Apache Flink or Apache Spark and determines the complete set of frequent sub graphs from arbitrary string-labeled directed multigraphs as they occur in social, business and knowledge networks.
...
...