MILC: Inverted List Compression in Memory

@article{Wang2017MILCIL,
  title={MILC: Inverted List Compression in Memory},
  author={Jianguo Wang and Chunbin Lin and Ruining He and Moojin Chae and Yannis Papakonstantinou and Steven Swanson},
  journal={Proc. VLDB Endow.},
  year={2017},
  volume={10},
  pages={853-864}
}
Inverted list compression is a topic that has been studied for 50 years due to its fundamental importance in numerous applications including information retrieval, databases, and graph analytics. Typically, an inverted list compression algorithm is evaluated on its space overhead and query processing time. Earlier list compression designs mainly focused on minimizing the space overhead to reduce expensive disk I/O time in disk-oriented systems. But the recent trend is shifted towards reducing… 
IIU: Specialized Architecture for Inverted Index Search
TLDR
IU, a novel inverted index processing unit, is presented to optimize the query performance while maintaining a low memory overhead for index storage, and co-designs the indexing scheme and hardware accelerator so that the accelerator can process highly compressed inverted index at a high throughput.
MorphStore: Analytical Query Engine with a Holistic Compression-Enabled Processing Model
TLDR
MorphStore, an open-source in-memory columnar analytical query engine with a novel holistic compression-enabled processing model, is presented, showing that continuous usage of compression for all base data and all intermediates is very beneficial to reduce the overall memory footprint as well as to improve the query performance.
MorphStore
TLDR
MorphStore, an open-source in-memory columnar analytical query engine with a novel holistic compression-enabled processing model, is presented, showing that continuous usage of compression for all base data and all intermediates is very beneficial to reduce the overall memory footprint as well as to improve the query performance.
Highly Efficient String Similarity Search and Join over Compressed Indexes
TLDR
A flexible framework CSS is proposed to reduce the index size and keep high query performance for string search and join applications and gives improved solutions for offline inverted lists construction to better support string similarity search.
BOSS: Bandwidth-Optimized Search Accelerator for Storage-Class Memory
TLDR
BOSS is proposed, the first near-data processing (NDP) architecture for inverted index search on SCM-based pooled memory, which maintains high throughput of query processing in this bandwidth- constrained environment.
Griffin : Uniting CPU and GPU in Search Engines for Intra-Query Parallelism
TLDR
Griffin is a search engine that dynamically combines GPUand CPU-based algorithms to process individual queries according to their characteristics, and achieves the best available GPU search engine performance by leveraging a new compression scheme and exploiting an advanced merge-based intersection algorithm.
Index Compression Using Byte-Aligned ANS Coding and Two-Dimensional Contexts
TLDR
Improvements in block-based inverted index compression, such as the OptPFOR mechanism, yield superior compression for index data, outperforming the reference point set by the Interp mechanism and hence representing a significant step forward.
K-ary search tree revisited: improving construction and intersection efficiency
TLDR
By aligning the node size with buffer size of faster cache, the data is expected to be better utilized before evicted out, and fewer cache misses are triggered as well.
Compact inverted index storage using general‐purpose compression libraries
TLDR
Experiments show that standard compression libraries can provide compression effectiveness as good as or better than previous methods, with decoding rates only moderately slower than reference implementations of those tailored approaches.
Fast Dictionary-Based Compression for Inverted Indexes
TLDR
This work applies dictionary-based techniques to the compression of inverted lists, showing that the high degree of regularity that these integer sequences exhibit is a good match for certain types of dictionary methods, and that an important new trade-off balance between compression effectiveness and compression efficiency can be achieved.
...
...

References

SHOWING 1-10 OF 55 REFERENCES
Efficient Index Compression in DB2 LUW
TLDR
The design of index compression in DB2 LUW is detailed and the challenges that were encountered in meeting the design goals are discussed and its effectiveness is demonstrated by showing performance results on typical customer scenarios.
Performance of compressed inverted list caching in search engines
TLDR
The overall goal of this paper is to provide an updated discussion and evaluation of these two techniques, and to show how to select the best set of approaches and settings depending on parameter such as disk speed and main memory cache size.
Data Compression for Analytics over Large-scale In-memory Column Databases (Summary Paper)
TLDR
This work presents an updated discussion about whether it is valuable to use data compression techniques in memory databases and if yes, how should memory databases apply data compression schemes to maximize performance.
Leveraging Context-Free Grammar for Efficient Inverted Index Compression
TLDR
This paper proposes a new grammar-based inverted index compression scheme, which can improve the performance of both index compression and query processing, and shows that it can be combined with common docID reassignment methods and encoding techniques and yields about 14% to 27% higher throughput for AND queries by utilizing multiple threads.
Super-Scalar RAM-CPU Cache Compression
TLDR
This work proposes three new versatile compression schemes (PDICT, PFOR, and PFOR-DELTA) that are specifically designed to extract maximum IPC from modern CPUs and compares these algorithms with compression techniques used in (commercial) database and information retrieval systems.
Partitioned Elias-Fano indexes
TLDR
This paper describes a new representation of monotone sequences based on partitioning the list into chunks and encoding both the chunks and their endpoints with Elias-Fano, hence forming a two-level data structure that offers significantly better compression and improves compression ratio/query time trade-off.
Compressing relations and indexes
We propose a new compression algorithm that is tailored to database applications. It can be applied to a collection of records, and is especially effective for records with many low to medium
VSEncoding: efficient coding and fast decoding of integer lists via dynamic programming
TLDR
Experiments show that this class of encoders outperform all the existing methods in literature by more than 10% (with the exception of Binary Interpolative Coding with which they, roughly, tie) still retaining a very fast decompression algorithm.
Cache Conscious Indexing for Decision-Support in Main Memory
TLDR
A new indexing technique called \Cache-Sensitive Search Trees" (CSS-trees) is proposed, to provide faster lookup times than binary search by paying attention to reference locality and cache behavior, without using substantial extra space.
FAST: fast architecture sensitive tree search on modern CPUs and GPUs
TLDR
FAST is an extremely fast architecture sensitive layout of the index tree logically organized to optimize for architecture features like page size, cache line size, and SIMD width of the underlying hardware, and achieves a 6X performance improvement over uncompressed index search for large keys on CPUs.
...
...