Algebra of RDF Graphs for Querying Large-Scale Distributed Triple-Store

@inproceedings{Savnik2016AlgebraOR,
  title={Algebra of RDF Graphs for Querying Large-Scale Distributed Triple-Store},
  author={Iztok Savnik and Kiyoshi Nitta},
  booktitle={CD-ARES},
  year={2016}
}
Large-scale RDF graph databases stored in shared-nothing clusters require query processing engine that can effectively exploit highly parallel computation environment. We propose algebra of RDF graphs and its physical counterpart, physical algebra of RDF graphs, designed to implement queries as distributed dataflow programs that run on cluster of servers. Operations of algebra reflect the characteristic features of RDF graph data model while they are tied to the technology provided by… 

Towards an Algebraic Cost Model for Graph Operators

TLDR
This work takes a first step towards defining a cost model for graph-based operators based on an algebra and its primitives and evaluates its accuracy over a state of the art graph database and discusses its advantages and shortcomings.

Method of Big-Graph Partitioning Using a Skeleton Graph

TLDR
A new method of graph partitioning for big graphs that include a conceptual schema, called a schema graph, is proposed, which maps the types of triple-patterns to k fragments such that k corresponds to the size of the portions of the triple-store addressed by the single fragments.

Statistics of Knowledge Graphs Based On The Conceptual Schema

TLDR
A schema graph is introduced that represents the main framework for the computation of the statistics of knowledge graphs and an algorithm that determines the sub-graph of the schema graph affected by the insertion of one triple into the triplestore is proposed.

References

SHOWING 1-10 OF 18 REFERENCES

A relational algebra for SPARQL

TLDR
A transformation from SPARQL queries into the relational algebra, an intermediate language for the expression and analysis of queries that is widely used in the database area is described.

Foundations of SPARQL query optimization

TLDR
While studied in the context of a theoretically motivated set semantics, almost all results carry over to the official, bag-based semantics and therefore are of immediate practical relevance.

Massively Parallel Databases and MapReduce Systems

TLDR
This monograph covers the design principles and core features of systems for analyzing very large datasets using massively-parallel computation and storage techniques on large clusters of nodes.

Query evaluation techniques for large databases

TLDR
This survey describes a wide array of practical query evaluation techniques for both relational and postrelational database systems, including iterative execution of complex query evaluation plans, the duality of sort- and hash-based set-matching algorithms, types of parallel query execution and their implementation, and special operators for emerging database application domains.

Hadoop: The Definitive Guide

TLDR
This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoops clusters.

The Expressive Power of SPARQL

TLDR
The main result is that SPARQL and non-recursive safe Datalog with negation have equivalent expressive power, and hence, by classical results, SParQL is equivalent from an expressiveness point of view to Relational Algebra.

Storage management for objects in EXODUS

TLDR
In the 1970’s, the relational data model was the focus of much of the research in the database area, and a number of relational systems are commercially available, and they support the majority of business applications relatively well.

Parallel Database Systems: The Future of High Performance Database Processing 1

TLDR
This paper reviews the techniques used by parallel database machine architectures, and surveys current commercial and research systems.

Parallel database systems: the future of high performance database systems

TLDR
Over the last decade 'Eradata, Tandem, and a host of startup companies have successfully developed and marketed highly parallel machines that refutes a 1983 paper predicting the demise of database machines.

Knowledge vault: a web-scale approach to probabilistic knowledge fusion

TLDR
The Knowledge Vault is a Web-scale probabilistic knowledge base that combines extractions from Web content (obtained via analysis of text, tabular data, page structure, and human annotations) with prior knowledge derived from existing knowledge repositories that computes calibrated probabilities of fact correctness.