Columnar Storage and List-based Processing for Graph Database Management Systems

@article{Gupta2021ColumnarSA,
  title={Columnar Storage and List-based Processing for Graph Database Management Systems},
  author={Pranjal Gupta and Amine Mhedhbi and Semih Salihoglu},
  journal={Proc. VLDB Endow.},
  year={2021},
  volume={14},
  pages={2491-2504}
}
We revisit column-oriented storage and query processing techniques in the context of contemporary graph database management systems (GDBMSs). Similar to column-oriented RDBMSs, GDBMSs support read-heavy analytical workloads that however have fundamentally different data access patterns than traditional analytical workloads. We first derive a set of desiderata for optimizing storage and query processors of GDBMS based on their access patterns. We then present the design of columnar storage… 
GRainDB: A Relational-core Graph-Relational DBMS
TLDR
GRainDB is a new system that extends the DuckDB RDBMS to provide graph modeling, querying, and visualization capabilities and modifies the internals of DuckDB to provide a set of fast join capabilities, such as predefined pointer-based joins that use system-level record IDs and adjacency list-like RID indices.

References

SHOWING 1-10 OF 87 REFERENCES
Integrating compression and execution in column-oriented database systems
TLDR
This paper shows how compression schemes not traditionally used in row-oriented DBMSs can be applied to column-oriented systems and evaluates a set of compression schemes and shows that the best scheme depends not only on the properties of the data but also on the nature of the query workload.
LiveGraph: A Transactional Graph Storage System with Purely Sequential Adjacency List Scans
TLDR
LiveGraph is presented, a graph storage system that outperforms both the best graph transactional systems and the best systems for real-time graph analytics on fresh data by ensuring that adjacency list scans, a key operation in graph workloads, are purely sequential.
Fast In-Memory SQL Analytics on Typed Graphs
TLDR
A GQ-Fast database is proposed, which is an indexed database that roughly corresponds to efficient encoding of annotated adjacency lists that combines salient features of column-based organization, indexing and compression.
Positional update handling in column stores
TLDR
A new data structure for maintaining such positional updates to columnar databases, called the Positional Delta Tree (PDT), is described, and detailed algorithms for PDT/column merging, updating PDTs, and for using PDTs in transaction management are described.
Monet; a next-Generation DBMS Kernel For Query-Intensive Applications
TLDR
This thesis is a reference to the Monet system in all its detail, and outlines an SQL front-end that uses Monet as a back-end, for constructing a full-fledged SQL compliant RDBMS including ACID properties.
Extending RDBMSs To Support Sparse Datasets Using An Interpreted Attribute Storage Format
TLDR
This paper argues that the proper way to handle sparse data is not to use a vertical schema, but rather to extend the RDBMS tuple storage format to allow the representation of sparse attributes as interpreted fields, and shows that the interpreted storage approach dominates in query efficiency and ease-of-use over the current horizontal storage and vertical schema approaches over a wide range of queries.
The implementation and performance of compressed databases
TLDR
This paper describes how the storage manager, the query execution engine, and the query optimizer of a database system can be extended to deal with compressed data and shows how compression can be integrated into a relational database system.
Compressing relations and indexes
We propose a new compression algorithm that is tailored to database applications. It can be applied to a collection of records, and is especially effective for records with many low to medium
Super-Scalar RAM-CPU Cache Compression
TLDR
This work proposes three new versatile compression schemes (PDICT, PFOR, and PFOR-DELTA) that are specifically designed to extract maximum IPC from modern CPUs and compares these algorithms with compression techniques used in (commercial) database and information retrieval systems.
Column-stores vs. row-stores: how different are they really?
TLDR
It is concluded that while it is not impossible for a row-store to achieve some of the performance advantages of a column-store, changes must be made to both the storage layer and the query executor to fully obtain the benefits of aColumn-oriented approach.
...
...