Generating Flexible Workloads for Graph Databases

@article{Bagan2016GeneratingFW,
  title={Generating Flexible Workloads for Graph Databases},
  author={Guillaume Bagan and Angela Bonifati and Radu Ciucanu and G. Fletcher and Aur{\'e}lien Lemay and Nicky Advokaat},
  journal={Proc. VLDB Endow.},
  year={2016},
  volume={9},
  pages={1457-1460}
}
Graph data management tools are nowadays evolving at a great pace. Key drivers of progress in the design and study of data intensive systems are solutions for synthetic generation of data and workloads, for use in empirical studies. Current graph generators, however, provide limited or no support for workload generation or are limited to fixed use-cases. Towards addressing these limitations, we demonstrate gMark, the first domain- and query language-independent framework for synthetic graph and… 

Figures from this paper

gMark: Schema-Driven Generation of Graphs and Queries
TLDR
The design and engineering principles of gMark, a domain- and query language-independent graph instance and query workload generator, are presented and the framework’s capabilities in generating high quality graphs and workloads and its ability to encode user-defined schemas across a variety of application domains are illustrated.
Graph Queries: Generation, Evaluation and Learning (Invited Talk)
TLDR
This talk will provide an overview of a comprehensive query-oriented graph benchmark that is designed and assessed, and present a learning framework for regular path queries and discuss its potential along with its practical feasibility.
Stability notions in synthetic graph generation: a preliminary study
TLDR
This work presents an initial study of stability in the context of a schema-driven synthetic graph generation and develops a preliminary approach in the recently proposed open- source synthetic graph generator gMark and demonstrates its viability in generating stable sequences of graphs.
Queries on Compressed Data
TLDR
Succinct is presented, a distributed data store that addresses low-latency, high-throughput systems for serving interactive queries using a fundamentally new approach --- executing a wide range of queries directly on a compressed representation of the input data --- thereby enabling efficient execution of queries on data sizes much larger than DRAM capacity.
Stability notions in synthetic graph generation: a preliminary study
TLDR
This work presents an initial study of stability in the context of a schema-driven synthetic graph generation and implements a preliminary approach in the recently proposed opensource synthetic graph generator gMark and demonstrates its viability in generating stable sequences of graphs.
ZipG: A Memory-efficient Graph Store for Interactive Queries
TLDR
On a single server with 244GB memory, ZipG executes tens of thousands of queries from these workloads for raw graph data over half a TB, which leads to an order of magnitude (sometimes as much as 23×) higher throughput than Neo4j and Titan.
Workload-Aware Subgraph Query Caching and Processing in Large Graphs
TLDR
This paper introduces a workload-aware sub graph querying framework, WaSQ, that leverages query workload for subgraph query rewriting, search plan refinement, partial results reusing, and false positive filtering towards facilitating the whole subgraph querying process.
VISUAL: Simulation of Visual Subgraph Query Formulation to Enable Automated Performance Benchmarking
TLDR
This work presents a novel synthetic visual subgraph query simulator called ViSual, built on top of an hci-inspired, extensible quantitative model which enables it to model the visual query formulation process quantitatively.
Optimization of Regular Path Queries in Graph Databases
TLDR
It is demonstrated that Waveguide properly subsumes existing techniques and that the new plans it adds are relevant, and the effective plan space which is enabled is analyzed.
A Schema-First Formalism for Labeled Property Graph Databases: Enabling Structured Data Loading and Analytics
TLDR
The proposed schema-driven formalism for graph databases provides several useful features, such as preventing both data corruption and long-term degradation of graph database structures.
...
...

References

SHOWING 1-9 OF 9 REFERENCES
gMark: Controlling Workload Diversity in Benchmarking Graph Databases
TLDR
The design and engineering principles of gMark are presented, a domain- and query language-independent graph benchmark exhibiting flexible schema and workload chokepoints and its ability to encode user-defined schemas across a variety of application domains.
Query languages for graph databases
TLDR
A brief survey of many of the graph query languages that have been proposed, focussing on the core functionality provided in these languages and issues such as expressive power and the computational complexity of query evaluation.
SP2Bench: A SPARQL Performance Benchmark
TLDR
SP^2Bench, a publicly available, language-specific SPARQL performance benchmark, which comprises both a data generator for creating arbitrarily large DBLP-like documents and a set of carefully designed benchmark queries.
SP^2Bench: A SPARQL Performance Benchmark
TLDR
SP^2Bench, a publicly available, language-specific SPARQL performance benchmark, which comprises both a data generator for creating arbitrarily large DBLP-like documents and a set of carefully designed benchmark queries.
Diversified Stress Testing of RDF Data Management Systems
TLDR
This work performs an in-depth experimental analysis that shows existing SPARQL benchmarks are not suitable for testing systems for diverse queries and varied workloads and provides stress testing tools for RDF data management systems, and uses the Waterloo SParQL Diversity Test Suite (WatDiv) to address these shortcomings.
The LDBC Social Network Benchmark: Interactive Workload
TLDR
This paper describes the LDBC Social Network Benchmark (SNB), and presents database benchmarking innovation in terms of graph query functionality tested, correlated graph generation techniques, as well as a scalable benchmark driver on a workload with complex graph dependencies.
Workload Matters: Why RDF Databases Need a New Design
TLDR
This work proposes a vision for a workload-aware and adaptive RDF system, and re-evaluate relevant existing physical design criteria for RDF and address the resulting set of new challenges.
Principles Of Database And Knowledge-Base Systems
This book goes into the details of database conception and use, it tells you everything on relational databases. from theory to the actual used algorithms.
Principles of Database and Knowledge-Base Systems, Volume II
  • J. Ullman
  • Computer Science
    Principles of computer science series
  • 1988