Compact Representation of Large RDF Data Sets for Publishing and Exchange

@inproceedings{Fernndez2010CompactRO,
  title={Compact Representation of Large RDF Data Sets for Publishing and Exchange},
  author={Javier D. Fern{\'a}ndez and Miguel A. Mart{\'i}nez-Prieto and Claudio Guti{\'e}rrez},
  booktitle={SEMWEB},
  year={2010}
}
Increasingly huge RDF data sets are being published on the Web. Currently, they use different syntaxes of RDF, contain high levels of redundancy and have a plain indivisible structure. All this leads to fuzzy publications, inefficient management, complex processing and lack of scalability. This paper presents a novel RDF representation (HDT) which takes advantage of the structural properties of RDF graphs for splitting and representing, efficiently, three components of RDF data: Header… 
Binary RDF for scalable publishing, exchanging and consumption in the web of data
TLDR
This article discusses an ongoing doctoral thesis addressing efficient formats for publication, exchange and consumption of RDF on a large scale, and proposes a binary serialization format for RDF, called HDT.
Exchange and Consumption of Huge RDF Data
TLDR
This paper shows how to enhance the exchanged HDT with additional structures to support some basic forms of SPARQL query resolution without the need of "unpacking" the data.
Lightweighting the Web of Data through Compact RDF/HDT
TLDR
This paper revisits the HDT format and exploits the latest findings in triples indexing for querying, exchanging and visualizing RDF information at large scale.
A RDF data compress model based on octree structure
TLDR
A compact indexed RDF structure called oct-triples is offered, which based on the octree structure, to store the indexes saved as three-dimensional matrix, which mapped by the dictionary and RDF triples, has clearly an advantage on the compressibility compared with traditional compression solutions.
Compressed Indexes for Fast Search of Semantic Data
TLDR
The extensive experimental analysis reveals that the best space/time trade-off configuration substantially outperforms existing solutions at the state-of-the-art, by taking 30–60 percent less space and speeding up query execution by a factor of 2 – 81×.
OFR: An Efficient Representation of RDF Datasets
TLDR
This work proposes an RDF compression algorithm that produces a succinct representation of RDF datasets that achieves compression ratios significantly better than the RDF compressors known from the literature.
CPOI: A Compact Method to Archive Versioned RDF Triple-Sets
TLDR
This paper proposes a novel storage index for archiving versions of such datasets, called CPOI (compact partial order index), that exploits the fact that an RDF Knowledge Base is a graph, and thus it has not a unique serialization (as it happens with text).
Horn-rule based compression technique for RDF data
TLDR
This work utilizes the various semantic associations that can be learned from RDF graphs to compress them, and shows that greater compression can be achieved compared to the existing technique.
Graph Pattern Based RDF Data Compression
TLDR
Evaluation on real world datasets shows that the proposed graph pattern based technologies can substantially reduce the size of RDF documents by complementing the abilities of existing approaches, and the evaluation results on rule mining operations show the potentials of the proposed serialisation format in supporting efficient data access.
...
...

References

SHOWING 1-10 OF 29 REFERENCES
Scalable join processing on very large RDF graphs
TLDR
Very light-weight methods for sideways information passing between separate joins at query run-time are developed, to provide highly effective filters on the input streams of joins in very large RDF graphs.
RDF-3X: a RISC-style engine for RDF
TLDR
The salient points of RDF-3X are a generic solution for storing and indexing RDF triples that completely eliminates the need for physical-design tuning, a powerful yet simple query processor that leverages fast merge joins to the largest possible extent, and a query optimizer for choosing optimal join orders using a cost model based on statistical synopses for entire join paths.
RDF compression: basic approaches
TLDR
It is shown that big RDF data sets are highly compressible due to the structure of RDF graphs (power law), organization of URIs and RDF syntax verbosity.
Matrix "Bit" loaded: a scalable lightweight join query processor for RDF data
TLDR
BitMat - a compressed bit-matrix structure for storing huge RDF graphs, and a novel, light-weight SPARQL join query processing method that employs an initial pruning technique, followed by a variable-binding-matching algorithm on BitMats to produce the final results are introduced.
An Efficient SQL-based RDF Querying Scheme
TLDR
An experimental study characterizing the overhead eliminated by avoiding procedural code at runtime, characterizing performance under various input conditions, and demonstrating scalability using 80 million RDF triples from UniProt protein and annotation data are presented.
Sindice.com: a document-oriented lookup index for open linked data
TLDR
Sindice, a lookup index over Semantic Web resources, allows applications to automatically locate documents containing information about a given resource, and extends the sitemap protocol to efficiently index large datasets with minimal impact on data providers.
Rdf vocabulary description language 1.0 : Rdf schema
TLDR
This specification describes how to use RDF to describe RDF vocabularies and defines other built-in RDF vocabulary initially specified in the RDF Model and Syntax Specification.
Characterizing the Semantic Web on the Web
TLDR
A collection of Semantic Web documents from an estimated ten million available on the Web is harvested and analyzed, and a number of metrics, properties and usage patterns found to follow a power law distribution are described.
On Graph Features of Semantic Web Schemas
TLDR
The main finding is that the majority of SW schemas with a significant number of properties approximate a power law for total-degree distribution and some emerging conceptual modeling practices of SW schema developers are revealed.
The webgraph framework I: compression techniques
TLDR
This papers presents the compression techniques used in WebGraph, which are centred around referentiation and intervalisation (which in turn are dual to each other).
...
...