A General-Purpose Query-Centric Framework for Querying Big Graphs

  title={A General-Purpose Query-Centric Framework for Querying Big Graphs},
  author={Da Yan and James Cheng and M. Tamer {\"O}zsu and Fan Yang and Yi Lu and John C.S. Lui and Qizhen Zhang and Wilfred Ng},
  journal={Proc. VLDB Endow.},
Pioneered by Google's Pregel, many distributed systems have been developed for large-scale graph analytics. These systems employ a user-friendly "think like a vertex" programming model, and exhibit good scalability for tasks where the majority of graph vertices participate in computation. However, the design of these systems can seriously under-utilize the resources in a cluster for processing light-workload graph queries, where only a small fraction of vertices need to be accessed. In this… 

Quegel: A General-Purpose System for Querying Big Graphs

This demonstration introduces a general-purpose system for querying big graphs, called Quegel, which treats queries as first-class citizens in the design of its computing model, and adopts a novel superstep-sharing execution model to overcome the weaknesses of existing systems.

SimGQ: Simultaneously Evaluating Iterative Graph Queries

SimGQ is developed, a system that optimizes simultaneous evaluation of a group of vertex queries that originate at different source vertices and delivers substantial speedups over a conventional framework that evaluates and responds to queries one by one.

Banyan: A Scoped Dataflow Engine for Graph Query Service

Banyan, an engine based on the scoped dataow model for GQS that improves performance by up to three orders of magnitude over state-of-the-art graph query engines, while providing performance isolation and load balancing.

G-thinker: a general distributed framework for finding qualified subgraphs in a big graph with load balancing

This work proposes the first truly CPU-bound distributed framework called G-thinker for subgraph finding algorithms, which adopts a task-based computationmodel, and which also provides a user-friendly subgraphcentric vertex-pulling API for writing distributed sub graph finding algorithms that can be easily adapted from existing serial algorithms.

WORQ: Workload-Driven RDF Query Processing

This paper studies the effect of several optimization techniques that enhance the performance of RDF queries with an order of magnitude enhancement in terms of preprocessing, storage, and query performance compared to the state-of-the-art solutions.

Systems for Big Graph Analytics

This talk starts with a brief review on Pregel, followed by an introduction on how to develop PRegel algorithms for various graph problems with performance guarantees, and introduces a few novel ideas and designs in improving the basic model of P Regel.

C-Graph: A Highly Efficient Concurrent Graph Reachability Query Framework

This paper presents an edge-set based graph traversal framework called C-Graph (i.e. Concurrent Graph), running on a distributed infrastructure, that achieves both high concurrency and efficiency for k-hop reachability queries and experimentally shows that the proposed framework outperforms several baseline methods.


The VRGQ framework is developed that accelerates the evaluation of a stream of queries via coarsegrained value reuse and the results of queries for a small set of source vertices are reused to speedup all future queries.


A data-locality-aware task scheduler for distributed social graph queries



Pregel: a system for large-scale graph processing

A model for processing large graphs that has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier.

Trinity: a distributed graph engine on a memory cloud

The introduction of Trinity, a general purpose graph engine over a distributed memory cloud that leverages graph access patterns in both online and offline computation to optimize memory and communication for best performance, which supports fast graph exploration as well as efficient parallel computing.

G-SPARQL: a hybrid engine for querying large attributed graphs

An algebraic compilation mechanism for the proposed query language, G-SPARQL, which is extended from the relational algebra and based on the basic construct of building SPARQL queries, the Triple Pattern is described.

An Experimental Comparison of Pregel-like Graph Processing Systems

A study to experimentally compare Giraph, GPS, Mizan, and Graphlab on equal ground by considering graph and algorithm agnostic optimizations and by using several metrics finds that the system optimizations present in Giraph and GraphLab allow them to perform well.

Systems for Big-Graphs

This tutorial discusses the design of the emerging systems for processing of big-graphs, key features of distributed graph algorithms, as well as graph partitioning and workload balancing techniques, and highlights the current challenges and some future research directions.

GraphChi: Large-Scale Graph Computation on Just a PC

This work presents GraphChi, a disk-based system for computing efficiently on graphs with billions of edges, and builds on the basis of Parallel Sliding Windows to propose a new data structure Partitioned Adjacency Lists, which is used to design an online graph database graphChi-DB.

GraphX: Graph Processing in a Distributed Dataflow Framework

This paper introduces GraphX, an embedded graph processing framework built on top of Apache Spark, a widely used distributed dataflow system and demonstrates that GraphX achieves an order of magnitude performance gain over the base dataflow framework and matches the performance of specialized graph processing systems while enabling a wider range of computation.

Blogel: A Block-Centric Framework for Distributed Computation on Real-World Graphs

This paper proposes a block-centric framework, called Blogel, which naturally handles all the three adverse graph characteristics, and is able to achieve orders of magnitude performance improvements over the state-of-the-art distributed graph computing systems.

Horton+: A Distributed System for Processing Declarative Reachability Queries over Partitioned Graphs

This work introduces three algebraic operators, select, traverse, and join, and a query is compiled into an execution plan containing these operators, which shows the efficiency of the optimizer in reducing query execution time, system scalability with the size of the graph and with the number of servers, and the convenience of using declarative queries.

I/O cost minimization: reachability queries processing over massive graphs

A new Yes-Label scheme is proposed, as a complement of the No-Label used in GRAIL, to reduce the number of intermediate results generated and how to minimize the I/O cost when answering reachability queries on massive graphs that cannot reside entirely in memory.