• Corpus ID: 15268949

Vectorized VByte Decoding

  title={Vectorized VByte Decoding},
  author={Jeff Plaisance and Nathan Kurz and Daniel Lemire},
We consider the ubiquitous technique of VByte compression, which represents each integer as a variable length sequence of bytes. The low 7 bits of each byte encode a portion of the integer, and the high bit of each byte is reserved as a continuation flag. This flag is set to 1 for all bytes except the last, and the decoding of each integer is complete when a byte with a high bit of 0 is encountered. VByte decoding can be a performance bottleneck especially when the unpredictable lengths of the… 

Figures and Tables from this paper

GPU-Accelerated Decoding of Integer Lists

Two encoding schemes for index decompression on GPU architectures are described and implemented, adapted from existing CPU-based compression methods to exploit the execution model and memory hierarchy offered by GPUs.

Make Larger Vector Register Sizes New Challenges?: Lessons Learned from the Area of Vectorized Lightweight Compression Algorithms

This paper systematically investigated the impact of different SIMD instruction set extensions with wider vector sizes on the behavior of straightforward transformed implementations, and describes the evaluation methodology and presents selective results of the exhaustive evaluation.

On Optimally Partitioning Variable-Byte Codes

This paper introduces an optimal partitioning algorithm that does not affect indexing time because of its linear-time complexity and shows that the query processing speed of Variable-Byte is preserved, with an extensive experimental analysis and comparison with several other state-of-the-art encoders.

Upscaledb: Efficient integer-key compression in a key-value store using SIMD instructions

From a Comprehensive Experimental Survey to a Cost-based Selection Strategy for Lightweight Integer Compression Algorithms

This article conducted an exhaustive experimental survey by evaluating several state-of-the-art lightweight integer compression algorithms as well as cascades of basic techniques, finding that there is no single-best algorithm.

Lightweight Data Compression Algorithms: An Experimental Survey (Experiments and Analyses)

This work conducted an exhaustive experimental survey by evaluating several state-of-the-art compression algorithms as well as cascades of basic techniques, finding that there is no single-best algorithm.

MILC: Inverted List Compression in Memory

This work proposes a new compression scheme, namely, MILC (memory inverted list compression), which relies on a series of techniques including offset-oriented fixed-bit encoding, dynamic partitioning, in-block compression, cache-aware optimization, and SIMD acceleration and experimentally shows that MILC improves the query performance and reduces the space overhead.

To share or not to share vector registers?

This work investigates the opportunity of sharing vector registers for concurrently running queries in analytical scenarios and demonstrates the feasibility of a new work sharing strategy, which can open up a wide spectrum of future research opportunities.

Reordering Based Lossless Compression Scheme for Term-Document Matrices

A novel variable byte encoding technique to compress inverted indexes and a bipolar permutation scheme based on hill climbing to reduce the bandwidth of the term-document matrix are presented.



SIMD-based decoding of posting lists

This paper starts by exploring variable-length integer encoding formats used to represent postings, and defines a taxonomy that classifies encodings along three dimensions, representing the way in which data bits are stored and additional bits are used to describe the data.

Decoding billions of integers per second through vectorization

A novel vectorized scheme called SIMD‐BP128⋆ is introduced that improves over previously proposed vectorized approaches and is nearly twice as fast as the previously fastest schemes on desktop processors (varint‐G8IU and PFOR).

Compressing Integers for Fast File Access

It is shown experimentally that, for large or small collections, storing integers in a compressed format reduces the time required for either sequential stream access or random access.

Efficient Index Compression in DB2 LUW

The design of index compression in DB2 LUW is detailed and the challenges that were encountered in meeting the design goals are discussed and its effectiveness is demonstrated by showing performance results on typical customer scenarios.

Challenges in building large-scale information retrieval systems: invited talk

  • J. Dean
  • Computer Science
    WSDM '09
  • 2009
This talk will discuss the evolution of Google's hardware infrastructure and information retrieval systems and some of the design challenges that arise from ever-increasing demands in all of these dimensions.