# Reordering columns for smaller indexes

@article{Lemire2011ReorderingCF, title={Reordering columns for smaller indexes}, author={D. Lemire and Owen Kaser}, journal={ArXiv}, year={2011}, volume={abs/0909.1346} }

Column-oriented indexes-such as projection or bitmap indexes-are compressed by run-length encoding to reduce storage and increase speed. Sorting the tables improves compression. On realistic data sets, permuting the columns in the right order before sorting can reduce the number of runs by a factor of two or more. Unfortunately, determining the best column order is NP-hard. For many cases, we prove that the number of runs in table columns is minimized if we sort columns by increasing… Expand

#### Figures, Tables, and Topics from this paper

#### Paper Mentions

#### 29 Citations

Minimizing Index Size by Reordering Rows and Columns

- Computer Science
- SSDBM
- 2012

This paper develops accurate statistical formulas that compute approximate solutions for reordering rows and columns of a data table and confirms that the heuristics of sorting columns with low column cardinalities first is indeed effective in reducing the index sizes. Expand

Column Partition and Permutation for Run Length Encoding in Columnar Databases

- Computer Science
- SIGMOD Conference
- 2020

This paper proposes an incremental heuristic that identifies the set of columns to be compressed and the order of rows that offer a better compression ratio, and improves the compression rate by up to 25% on test data, compared with compressing all columns of a table. Expand

Reordering rows for better compression: Beyond the lexicographic order

- Computer Science
- TODS
- 2012

It is proved that the new row reordering is optimal at minimizing the runs of identical values within columns, in a few cases, and it is found that run-length encoding can improve up to a factor of 3 whereas prefix coding can be improved by up to 80%: these gains are on top of the gains due to lexicographically sorting the table. Expand

Reordering Rows for Better Compression: Beyond the Lexicographic Order

- Computer Science
- 2012

It is proved that the new row reordering is optimal at minimizing the runs of identical values within columns, in a few cases, and it is found that run-length encoding can improve up to a factor of 3 whereas prefix coding can be improved by up to 80%: these gains are on top of the gains due to lexicographically sorting the table. Expand

Variable Length Compression for Bitmap Indices

- Computer Science
- DEXA
- 2011

The empirical study shows that in the best case the approach can out-compress BBC by 30% and WAH by 70%, for real data sets, and an algorithm that efficiently processes queries when encoding lengths share a common integer factor is presented. Expand

A meta-heuristic approach for RLE compression in a column store table

- Computer Science
- Soft Comput.
- 2019

This paper presents a comprehensive analysis and comparison of common and well-known meta-heuristics for columnar run minimization, based on standard implementations by using real datasets, and provides comprehensive implementations of the heuristic RLE compression approaches based on common optimization methods. Expand

Compressed bitmap indexes: beyond unions and intersections

- Computer Science
- Softw. Pract. Exp.
- 2016

This work shows that bitmap indexes are more broadly applicable than is commonly believed and introduces new algorithms that are sometimes three orders of magnitude faster than a naïve approach. Expand

A Genetic Algorithm Approach for Minimizing the Number of Columnar Runs in a Column Store Table

- Mathematics, Computer Science
- ICANNGA
- 2013

This paper presents a genetic algorithm for determining an optimal column sorting order which will minimize the number of columnar runs in a column store table and therefore maximize the RLE-based table compression. Expand

Threshold and Symmetric Functions over Bitmaps

- Computer Science
- ArXiv
- 2014

This work considers symmetric Boolean queries, and finds that the best of the bitmap-based algorithms are competitive with the state-of-the-art algorithms for important special cases (e.g., MergeOpt, MergeSkip, DivideSkip, ScanCount). Expand

Performance evaluation of fast integer compression techniques over tables

- Computer Science
- 2013

This study aims to quantify the trade-offs of fast integer compression schemes with respect to compression ratio and speed of compression and decompression, and finds that sorting can significantly improve the performance of compression. Expand

#### References

SHOWING 1-10 OF 99 REFERENCES

Sorting improves word-aligned bitmap indexes

- Computer Science
- Data Knowl. Eng.
- 2010

This work uses techniques based on run-length encoding (RLE) to accelerate logical operations (AND, OR, XOR) over bitmaps, such as Word-Aligned Hybrid (WAH) compression, and investigates row-reordering heuristics. Expand

Compressing table data with column dependency

- Computer Science
- Theor. Comput. Sci.
- 2007

This paper formalizes the notion of column dependency as a way to capture this information redundancy across columns and discusses how to automatically compute and use it to substantially improve table compression. Expand

Dictionary-based order-preserving string compression for main memory column stores

- Computer Science
- SIGMOD Conference
- 2009

This paper proposes new data structures that efficiently support an order-preserving dictionary compression for (variablelength) string attributes with a large domain size that is likely to change over time and introduces a novel indexing approach that provides efficient access paths to such a dictionary while compressing the index data. Expand

Compression of inverted indexes For fast query evaluation

- Computer Science
- SIGIR '02
- 2002

This paper proposes several simple optimisations to well-known integer compression schemes, and shows experimentally that these lead to significant reductions in time, and concludes that fast byte-aligned codes should be used to store integers in inverted lists. Expand

Integrating compression and execution in column-oriented database systems

- Computer Science
- SIGMOD Conference
- 2006

This paper shows how compression schemes not traditionally used in row-oriented DBMSs can be applied to column-oriented systems and evaluates a set of compression schemes and shows that the best scheme depends not only on the properties of the data but also on the nature of the query workload. Expand

Index compression is good, especially for random access

- Computer Science
- CIKM '07
- 2007

It is demonstrated that, in some cases, random access into a term's postings list may be realized more efficiently if the list is stored in compressed form instead of uncompressed, regardless of whether the index is stored on disk or in main memory. Expand

Binary Interpolative Coding for Effective Index Compression

- Computer Science
- Information Retrieval
- 2004

A new method for compressing inverted indexes is introduced that yields excellent compression, fast decoding, and exploits clustering—the tendency for words to appear relatively frequently in some parts of the collection and infrequently in others. Expand

Optimizing bitmap indices with efficient compression

- Computer Science
- TODS
- 2006

This article presents a new compression scheme called Word-Aligned Hybrid (WAH) code that makes compressed bitmap indices efficient even for high-cardinality attributes and proves that the new compressed bit map index, like the best variants of the B-tree index, is optimal for one-dimensional range queries. Expand

Read-optimized databases, in depth

- Computer Science
- Proc. VLDB Endow.
- 2008

This study examines five tables with various characteristics and different query workloads in order to obtain a greater understanding and quantification of the relative performance of column stores and row stores. Expand

C-Store: A Column-oriented DBMS

- Computer Science
- VLDB
- 2005

Preliminary performance data on a subset of TPC-H is presented and it is shown that the system the team is building, C-Store, is substantially faster than popular commercial products. Expand