• Corpus ID: 5581955

Scalable Semantic Web Data Management Using Vertical Partitioning

@inproceedings{Abadi2007ScalableSW,
  title={Scalable Semantic Web Data Management Using Vertical Partitioning},
  author={Daniel J. Abadi and Adam Marcus and Samuel Madden and Katherine J. Hollenbach},
  booktitle={VLDB},
  year={2007}
}
Efficient management of RDF data is an important factor in realizing the Semantic Web vision. Performance and scalability issues are becoming increasingly pressing as Semantic Web technology is applied to real-world applications. In this paper, we examine the reasons why current data management solutions for RDF data scale poorly, and explore the fundamental scalability limitations of these approaches. We review the state of the art for improving performance for RDF databases and consider a… 

Figures and Tables from this paper

SPARTI: Scalable RDF Data Management Using Query-Centric Semantic Partitioning
TLDR
This paper investigates SPARTI, a scalable RDF data management system that combines a budgeting mechanism with a cost model to determine the worthiness of partitioning and is shown to execute queries around half the time over all query shapes while maintaining around an order of magnitude enhancement in storage requirements.
Data Partitioning Scheme for Efficient Distributed RDF Querying Using Apache Spark
TLDR
This work proposes a relational partitioning scheme called Subset Property Table (SPT) for RDF data that further partitions the existing Property Table approach into subsets of tables to minimize query input and join operation and combines it with another existing model Vertical Partitioning (VP).
RDF-4X: a scalable solution for RDF quads store in the cloud
TLDR
This paper proposes a scalable solution for RDF data management that uses Apache Accumulo, introducing storage methods and indexing techniques that scale to billions of quads across multiple nodes, while providing fast and easy access to the data through conventional query mechanisms such as SPARQL.
RDF Data Storage Techniques for Efficient SPARQL Query Processing Using Distributed Computation Engines
  • Mahmudul Hassan, S. Bansal
  • Computer Science
    2018 IEEE International Conference on Information Reuse and Integration (IRI)
  • 2018
TLDR
This paper introduces distributed RDF data stores, namely VPExp and 3CStore, based on the existing vertical partitioning (VP) approach, and presents an evaluation of query performance of these systems built upon two popular distributed computation engines namely, Spark and Drill.
An Experimental Comparison of RDF Data Management Approaches in a SPARQL Benchmark Scenario
TLDR
An experimental comparison of existing storage strategies on top of the SP2Bench SPARQL performance benchmark suite is presented and it is concluded that future research is necessary to further bring forward RDF data management.
Column-store support for RDF data management: not all swans are white
TLDR
This paper reports on the results of an independent evaluation of the techniques presented in the VLDB 2007 paper "Scalable Semantic Web Data Management Using Vertical Partitioning", as well as a complementary analysis of state-of-the-art RDF storage solutions.
Compressed vertical partitioning for efficient RDF management
TLDR
This article introduces a novel RDF indexing technique that supports efficient SPARQL solution in compressed space and enhances this model with two compact indexes listing the predicates related to each different subject and object in the dataset, in order to address the specific weaknesses of vertically partitioned representations.
Scalable and Efficient Self-Join Processing technique in RDF data
TLDR
An alternative solution to facilitate flexibility and efficiency in that queries and try to reach to the optimal solution to decrease the self-joins as much as possible, this solution based on the idea of "Recursive Mapping of Twin Tables".
String-Based Semantic Web Data Management Using Ternary B-Trees
TLDR
This work proposes the ternary B-tree as a new data structure for storing and accessing RDF, string-based, making use of the intrinsic features of RDF.
Compressed Vertical Partitioning for Efficient RDF Management 1
TLDR
A novel RDF indexing technique that supports efficient SPARQL solution in compressed space that achieves by far the most compressed representations, but also achieves the best overall performance for RDF retrieval in the authors' experimental setup.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 42 REFERENCES
Cse 718 -seminar Report Scalable Semantic Web Data Management Using Vertical Partitioning
TLDR
This paper first explores the poor scalability limitations of current data management solutions for RDF data then proposes a solution over these limitations – the property tables, and proposes an alternative solution – vertically partitioning the RDFData.
An Efficient SQL-based RDF Querying Scheme
TLDR
An experimental study characterizing the overhead eliminated by avoiding procedural code at runtime, characterizing performance under various input conditions, and demonstrating scalability using 80 million RDF triples from UniProt protein and annotation data are presented.
Efficient RDF Storage and Retrieval in Jena2
TLDR
This paper describes the persistence subsystem of Jena2 which is intended to support large datasets and query optimization for RDF is identified as a promising area for future research.
Storing RDF as a graph
  • Valerie Bönström, A. Hinze, H. Schweppe
  • Computer Science
    Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726)
  • 2003
TLDR
This work presents a new approach to store RDF data as a graph in a object-oriented database, which avoids the costly rebuilding of the graph and efficiently queries the storage structure directly.
The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases
TLDR
This paper advocate the use of database technology to support declarative access, as well as, logical and physical independence for voluminous RDF description bases, and presents RDFSuite, a suite of tools for RDF validation, storage and querying.
Relational Databases for Querying XML Documents: Limitations and Opportunities
TLDR
It turns out that the relational approach can handle most (but not all) of the semantics of semi-structured queries over XML data, but is likely to be effective only in some cases.
Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema
TLDR
This work presents an overview of Sesame, an architecture for efficient storage and expressive querying of large quantities of metadata in RDF and RDF Schema, and its implementation and the first experiences with this implementation.
Jena Property Table Implementation
TLDR
This paper describes a property table design and implementation for Jena, an RDF Semantic Web toolkit, and a design goal is to make Jena property tables look like normal relational database tables.
Benchmarking Database Representations of RDF/S Stores
TLDR
The main conclusion drawn from the experiments is that the evaluation of taxonomic queries is most efficient over RDF/S stores utilizing the Hybrid and MatView representations.
Extending RDBMSs To Support Sparse Datasets Using An Interpreted Attribute Storage Format
TLDR
This paper argues that the proper way to handle sparse data is not to use a vertical schema, but rather to extend the RDBMS tuple storage format to allow the representation of sparse attributes as interpreted fields, and shows that the interpreted storage approach dominates in query efficiency and ease-of-use over the current horizontal storage and vertical schema approaches over a wide range of queries.
...
1
2
3
4
5
...