An Experimental Comparison of RDF Data Management Approaches in a SPARQL Benchmark Scenario

@inproceedings{Schmidt2008AnEC,
  title={An Experimental Comparison of RDF Data Management Approaches in a SPARQL Benchmark Scenario},
  author={Michael Schmidt and Thomas Hornung and Norbert K{\"u}chlin and Georg Lausen and Christoph Pinkel},
  booktitle={SEMWEB},
  year={2008}
}
Efficient RDF data management is one of the cornerstones in realizing the Semantic Web vision. In the past, different RDF storage strategies have been proposed, ranging from simple triple stores to more advanced techniques like clustering or vertical partitioning on the predicates. We present an experimental comparison of existing storage strategies on top of the SP2Bench SPARQL performance benchmark suite and put the results into context by comparing them to a purely relational model of the… 
SW-Store: a vertically partitioned DBMS for Semantic Web data management
TLDR
The results show that a vertically partitioned schema achieves similar performance to the property table technique while being much simpler to design, and the architecture of SW-Store, a new DBMS that implements these techniques to achieve high performance RDF data management.
Benchmarking Spark-SQL under Alliterative RDF Relational Storage Backends
TLDR
A systematic comparison of there relevant RDF relational schemas queried using Apache Spark shows many interesting insights about the impact of the relational encoding scheme, storage backends and storage formats on the performance of the query execution process.
An In-depth Investigation of Large-scale RDF Relational Schema Optimizations Using Spark-SQL
TLDR
One of the most significant challenges of large-scale RDF data processing over Apache Spark, the relational schema optimization is discussed and insights into these schemas’ relative strengths are provided by comparing three different partitioning techniques and four other storage formats.
Compressed vertical partitioning for efficient RDF management
TLDR
This article introduces a novel RDF indexing technique that supports efficient SPARQL solution in compressed space and enhances this model with two compact indexes listing the predicates related to each different subject and object in the dataset, in order to address the specific weaknesses of vertically partitioned representations.
Towards making sense of Spark-SQL performance for processing vast distributed RDF datasets
TLDR
A systematic evaluation of the performance of SparkSQL engine for processing SPARQL queries using three relevant RDF relational schemas, and two different storage backends, namely, Hive, and HDFS is presented.
FlexTable: Using a Dynamic Relation Model to Store RDF Data
TLDR
This paper proposes a system called FlexTable, where all triples of an instance are coalesced into one tuple and all tuples are stored in relation schemas, based on a lattice structure to automatically evolve schemas while new triples are inserted.
NoSQL Databases for RDF: An Empirical Evaluation
TLDR
This work is the first systematic attempt at characterizing and comparing NoSQL stores for RDF processing and compares their key characteristics when running standard RDF benchmarks on a popular cloud infrastructure using both single-machine and distributed deployments.
Scalable and Efficient Self-Join Processing technique in RDF data
TLDR
An alternative solution to facilitate flexibility and efficiency in that queries and try to reach to the optimal solution to decrease the self-joins as much as possible, this solution based on the idea of "Recursive Mapping of Twin Tables".
Compressed Vertical Partitioning for Efficient RDF Management 1
TLDR
A novel RDF indexing technique that supports efficient SPARQL solution in compressed space that achieves by far the most compressed representations, but also achieves the best overall performance for RDF retrieval in the authors' experimental setup.
...
...

References

SHOWING 1-10 OF 25 REFERENCES
Scalable Semantic Web Data Management Using Vertical Partitioning
TLDR
The results show that a vertical partitioned schema achieves similar performance to the property table technique while being much simpler to design, and if a column-oriented DBMS is used instead of a row-oriented database, another order of magnitude performance improvement is observed, with query times dropping from minutes to several seconds.
Column-store support for RDF data management: not all swans are white
TLDR
This paper reports on the results of an independent evaluation of the techniques presented in the VLDB 2007 paper "Scalable Semantic Web Data Management Using Vertical Partitioning", as well as a complementary analysis of state-of-the-art RDF storage solutions.
SP2Bench: A SPARQL Performance Benchmark
TLDR
SP^2Bench, a publicly available, language-specific SPARQL performance benchmark, which comprises both a data generator for creating arbitrarily large DBLP-like documents and a set of carefully designed benchmark queries.
SP^2Bench: A SPARQL Performance Benchmark
TLDR
SP^2Bench, a publicly available, language-specific SPARQL performance benchmark, which comprises both a data generator for creating arbitrarily large DBLP-like documents and a set of carefully designed benchmark queries.
Storing RDF as a graph
  • Valerie Bönström, A. Hinze, H. Schweppe
  • Computer Science
    Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726)
  • 2003
TLDR
This work presents a new approach to store RDF data as a graph in a object-oriented database, which avoids the costly rebuilding of the graph and efficiently queries the storage structure directly.
An Efficient SQL-based RDF Querying Scheme
TLDR
An experimental study characterizing the overhead eliminated by avoiding procedural code at runtime, characterizing performance under various input conditions, and demonstrating scalability using 80 million RDF triples from UniProt protein and annotation data are presented.
The Berlin SPARQL Benchmark
TLDR
The Berlin SPARQL Benchmark (BSBM) is introduced, built around an e-commerce use case in which a set of products is offered by different vendors and consumers have posted reviews about products, and emulates the search and navigation pattern of a consumer looking for a product.
Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema
TLDR
This work presents an overview of Sesame, an architecture for efficient storage and expressive querying of large quantities of metadata in RDF and RDF Schema, and its implementation and the first experiences with this implementation.
Benchmarking Database Representations of RDF/S Stores
TLDR
The main conclusion drawn from the experiments is that the evaluation of taxonomic queries is most efficient over RDF/S stores utilizing the Hybrid and MatView representations.
Jena Property Table Implementation
TLDR
This paper describes a property table design and implementation for Jena, an RDF Semantic Web toolkit, and a design goal is to make Jena property tables look like normal relational database tables.
...
...