Compact Representation of Large RDF Data Sets for Publishing and Exchange
@inproceedings{Fernndez2010CompactRO, title={Compact Representation of Large RDF Data Sets for Publishing and Exchange}, author={Javier D. Fern{\'a}ndez and Miguel A. Mart{\'i}nez-Prieto and Claudio Guti{\'e}rrez}, booktitle={SEMWEB}, year={2010} }
Increasingly huge RDF data sets are being published on the Web. Currently, they use different syntaxes of RDF, contain high levels of redundancy and have a plain indivisible structure. All this leads to fuzzy publications, inefficient management, complex processing and lack of scalability. This paper presents a novel RDF representation (HDT) which takes advantage of the structural properties of RDF graphs for splitting and representing, efficiently, three components of RDF data: Header…
64 Citations
Binary RDF for scalable publishing, exchanging and consumption in the web of data
- Computer ScienceWWW
- 2012
This article discusses an ongoing doctoral thesis addressing efficient formats for publication, exchange and consumption of RDF on a large scale, and proposes a binary serialization format for RDF, called HDT.
Exchange and Consumption of Huge RDF Data
- Computer ScienceESWC
- 2012
This paper shows how to enhance the exchanged HDT with additional structures to support some basic forms of SPARQL query resolution without the need of "unpacking" the data.
Lightweighting the Web of Data through Compact RDF/HDT
- Computer ScienceCAEPIA
- 2011
This paper revisits the HDT format and exploits the latest findings in triples indexing for querying, exchanging and visualizing RDF information at large scale.
A RDF data compress model based on octree structure
- Computer Science2017 12th IEEE Conference on Industrial Electronics and Applications (ICIEA)
- 2017
A compact indexed RDF structure called oct-triples is offered, which based on the octree structure, to store the indexes saved as three-dimensional matrix, which mapped by the dictionary and RDF triples, has clearly an advantage on the compressibility compared with traditional compression solutions.
Compressed Indexes for Fast Search of Semantic Data
- Computer ScienceIEEE Transactions on Knowledge and Data Engineering
- 2021
The extensive experimental analysis reveals that the best space/time trade-off configuration substantially outperforms existing solutions at the state-of-the-art, by taking 30–60 percent less space and speeding up query execution by a factor of 2 – 81×.
OFR: An Efficient Representation of RDF Datasets
- Computer ScienceSLATE
- 2015
This work proposes an RDF compression algorithm that produces a succinct representation of RDF datasets that achieves compression ratios significantly better than the RDF compressors known from the literature.
CPOI: A Compact Method to Archive Versioned RDF Triple-Sets
- Computer ScienceArXiv
- 2019
This paper proposes a novel storage index for archiving versions of such datasets, called CPOI (compact partial order index), that exploits the fact that an RDF Knowledge Base is a graph, and thus it has not a unique serialization (as it happens with text).
Horn-rule based compression technique for RDF data
- Computer ScienceSAC
- 2015
This work utilizes the various semantic associations that can be learned from RDF graphs to compress them, and shows that greater compression can be achieved compared to the existing technique.
Graph Pattern Based RDF Data Compression
- Computer ScienceJIST
- 2014
Evaluation on real world datasets shows that the proposed graph pattern based technologies can substantially reduce the size of RDF documents by complementing the abilities of existing approaches, and the evaluation results on rule mining operations show the potentials of the proposed serialisation format in supporting efficient data access.
References
SHOWING 1-10 OF 29 REFERENCES
Scalable join processing on very large RDF graphs
- Computer ScienceSIGMOD Conference
- 2009
Very light-weight methods for sideways information passing between separate joins at query run-time are developed, to provide highly effective filters on the input streams of joins in very large RDF graphs.
RDF-3X: a RISC-style engine for RDF
- Computer ScienceProc. VLDB Endow.
- 2008
The salient points of RDF-3X are a generic solution for storing and indexing RDF triples that completely eliminates the need for physical-design tuning, a powerful yet simple query processor that leverages fast merge joins to the largest possible extent, and a query optimizer for choosing optimal join orders using a cost model based on statistical synopses for entire join paths.
RDF compression: basic approaches
- Computer ScienceWWW '10
- 2010
It is shown that big RDF data sets are highly compressible due to the structure of RDF graphs (power law), organization of URIs and RDF syntax verbosity.
Matrix "Bit" loaded: a scalable lightweight join query processor for RDF data
- Computer ScienceWWW '10
- 2010
BitMat - a compressed bit-matrix structure for storing huge RDF graphs, and a novel, light-weight SPARQL join query processing method that employs an initial pruning technique, followed by a variable-binding-matching algorithm on BitMats to produce the final results are introduced.
An Efficient SQL-based RDF Querying Scheme
- Computer ScienceVLDB
- 2005
An experimental study characterizing the overhead eliminated by avoiding procedural code at runtime, characterizing performance under various input conditions, and demonstrating scalability using 80 million RDF triples from UniProt protein and annotation data are presented.
Semantics and Complexity of SPARQL
- Computer ScienceSEMWEB
- 2006
This paper addresses systematically the formal study of SPARQL, concentrating in its graph pattern facility, providing a compositional semantics, and proving there are normal forms, among others that the evaluation of SParQL patterns is PSPACE-complete.
Sindice.com: a document-oriented lookup index for open linked data
- Computer ScienceInt. J. Metadata Semant. Ontologies
- 2008
Sindice, a lookup index over Semantic Web resources, allows applications to automatically locate documents containing information about a given resource, and extends the sitemap protocol to efficiently index large datasets with minimal impact on data providers.
Rdf vocabulary description language 1.0 : Rdf schema
- Computer Science
- 2004
This specification describes how to use RDF to describe RDF vocabularies and defines other built-in RDF vocabulary initially specified in the RDF Model and Syntax Specification.
Characterizing the Semantic Web on the Web
- Computer ScienceSEMWEB
- 2006
A collection of Semantic Web documents from an estimated ten million available on the Web is harvested and analyzed, and a number of metrics, properties and usage patterns found to follow a power law distribution are described.