Integrating compression and execution in column-oriented database systems

@article{Abadi2006IntegratingCA,
  title={Integrating compression and execution in column-oriented database systems},
  author={Daniel J. Abadi and Samuel Madden and Miguel Ferreira},
  journal={Proceedings of the 2006 ACM SIGMOD international conference on Management of data},
  year={2006}
}
Column-oriented database system architectures invite a re-evaluation of how and when data in databases is compressed. Storing data in a column-oriented fashion greatly increases the similarity of adjacent records on disk and thus opportunities for compression. The ability to compress many adjacent tuples at once lowers the per-tuple cost of compression, both in terms of CPU and space overheads.In this paper, we discuss how we extended C-Store (a column-oriented DBMS) with a compression sub… 
A Short Survey of Data Compression Techniques for Column Oriented Databases
TLDR
This paper surveys the various data compression techniques in column oriented databases and finds specialized algorithms based on the type of the data stored in column results in immense improvements in compression ratios.
Column-oriented query processing for row stores
TLDR
This paper shows that column-oriented query processing can significantly improve the performance of row-oriented DBMSs and introduces new operators that take into account the unique characteristics of data obtained from indexes, and exploits new technologies such as flash SSDs and multi-core processors to boost the performance.
TICC: Transparent Inter-Column Compression for Column-Oriented Database Systems
TLDR
The experimental results demonstrate that TICC can significantly reduce the storage overhead and process a variety of queries over large-scale data with up to 20% performance improvement over the original Hive.
Impact of Data Compression on the Performance of Column-oriented Data Stores
Compression of data in traditional relational database management systems significantly improves the system performance by decreasing the size of the data that results in less data transfer time
Optimizations and Heuristics to improve Compression in Columnar Database Systems
TLDR
This paper presents two novel optimizations in compression techniques - Block Size Optimized Cluster Encoding and Block size Optimized Indirect Encoding - which perform better than their predecessors and proposes heuristics to choose the best encoding amongst common compression schemes.
A Comparative Study of Database Systems
TLDR
This paper throws a light on basic difference between column-oriented databases and traditional row- oriented databases and describes how Column oriented DBMS's are better than traditional row of DBMSs.
Columnar Storage and List-based Processing for Graph Database Management Systems
TLDR
This work revisits column-oriented storage and query processing techniques in the context of contemporary graph database management systems (GDBMS) and proposes novel ones that are optimized for GDBMSs, including a novel list-based query processor, a new data structure the authors call single-indexed edge property pages and an accompanying edge ID scheme.
VParC: a compression scheme for numeric data in column-oriented databases
TLDR
This paper developed a compression scheme called Vertically Partitioning Compression (VParC), suitable for columns with different data distributions, even for irregular columns in some cases, and that data compressed by VParC can be operated directly without decompression in advance.
A Cost-Aware Strategy for Merging Differential Stores in Column-Oriented In-Memory DBMS
TLDR
This paper describes a new merge algorithm that applies full and partial merge operations based on their costs and improvement of read performance, and shows by simulation that this algorithm reduces merge costs significantly for workloads found in enterprise applications, while improving read performance at the same time.
Integrating Column-Oriented Storage and Query Processing Techniques Into Graph Database Management Systems
TLDR
This work revisits column-oriented storage and query processing techniques in the context of contemporary graph database management systems (GDBMS) and proposes novel ones that are optimized for GDBMSs, including a novel listbased query processor.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 38 REFERENCES
C-Store: A Column-oriented DBMS
TLDR
Preliminary performance data on a subset of TPC-H is presented and it is shown that the system the team is building, C-Store, is substantially faster than popular commercial products.
Data Compression Support in Databases
TLDR
Various design issues arise in the use of data compression in the dbms from the choice of algorithm, statistics collection, hardware versus software based compression, location of the compression function in the overall computer system architecture, unit of compression, update in place, and the application of log’ to compressed data.
Query optimization in compressed database systems
TLDR
This paper proposes a IIierarchical Dictionary Encoding strategy that intelligently selects the most effective compression method for string-valued attributes and proposes one provably optimal and two fast heuristic algorithms for selecting a query plan for relational schemas with compressed attributes.
Database Compression: A Performance Enhancement Tool
TLDR
It is argued here that database compression is attractive from a query processing viewpoint also and should therefore be implemented even when disk storage is plentiful, and presents a modiied attribute level compression algorithm, based on non-adaptive arithmetic compression, called COLA, which simultaneously provides good query processing and reasonable compression ratios.
The implementation and performance of compressed databases
TLDR
This paper describes how the storage manager, the query execution engine, and the query optimizer of a database system can be extended to deal with compressed data and shows how compression can be integrated into a relational database system.
Super-Scalar RAM-CPU Cache Compression
TLDR
This work proposes three new versatile compression schemes (PDICT, PFOR, and PFOR-DELTA) that are specifically designed to extract maximum IPC from modern CPUs and compares these algorithms with compression techniques used in (commercial) database and information retrieval systems.
Compressing relations and indexes
We propose a new compression algorithm that is tailored to database applications. It can be applied to a collection of records, and is especially effective for records with many low to medium
Weaving Relations for Cache Performance
TLDR
This paper proposes a new data organization model called PAX (Partition Attributes Across), that significantly improves cache performance by grouping together all values of each attribute within each page, and demonstrates that in-page data placement is the key to high cache performance.
Database compression
TLDR
This work addresses several aspects of reversible data compression and compression techniques, including general concepts of data compression; a number of compression techniques; a comparison of the effects of compression on common data types; advantages and disadvantages; and future research needs.
Data compression and database performance
TLDR
It is shown that many query processing algorithms can manipulate compressed data just as well as decompressed data, and that processing compressed data can speed query processing by a factor much larger than the compression factor.
...
1
2
3
4
...