Learn More
The volume of RDF data continues to grow over the past decade and many known RDF datasets have billions of triples. A grant challenge of managing this huge RDF data is how to access this big RDF data efficiently. A popular approach to addressing the problem is to build a full set of permutations of (S, P, O) indexes. Although this approach has shown to(More)
The flexibility of the RDF data model has attracted an increasing number of organizations to store their data in an RDF format. With the rapid growth of RDF datasets, we envision that it is inevitable to deploy a cluster of computing nodes to process large-scale RDF data in order to deliver desirable query performance. In this paper, we address the(More)
The emerging need for conducting complex analysis over big RDF datasets calls for scale-out solutions that can harness a computing cluster to process big RDF datasets. Queries over RDF data often involve complex self-joins, which would be very expensive to run if the data are not carefully partitioned across the cluster and hence distributed joins over(More)
In order to explore RDF data in decision-making, there is an increasing demand for online analytical processing of such data. Since RDF data model is a graph model and highly flexible, it poses some challenges for analytical tasks. Although many systems have been developed to store RDF data, few of them can perform analytical tasks due to many reasons.(More)
The scale of RDF graph grows very rapidly. Managing huge scale RDF graph distributively is becoming increasingly important. Partitioning RDF graph is a vital pre-processing step for the goal. When applying graph partitioning algorithms developed over past decades to RDF graph represented using well known RDF model such as Directed Labeled Graphs, Bipartite(More)
Existing parallel SPARQL query optimizers assume hash-based data partitioning and adopt plan enumeration algorithms with unnecessarily high complexity. Therefore, they cannot easily accommodate other partitioning methods and only consider an unnecessarily limited plan space. To address these problems, we first define a generic RDF data partitioning model to(More)
  • 1