• Corpus ID: 230770217

Split block Bloom filters

@article{Apple2021SplitBB,
  title={Split block Bloom filters},
  author={Jim Apple},
  journal={ArXiv},
  year={2021},
  volume={abs/2101.01719}
}
  • Jim Apple
  • Published 4 January 2021
  • Medicine
  • ArXiv
This short note describes a Bloom filter variant that takes advantage of modern SIMD instructions to increase speed by 30%-450%. This filter, the split block Bloom filter, is used by Apache Impala, Apache Kudu, Apache Parquet, and Apache Arrow. 

References

SHOWING 1-9 OF 9 REFERENCES

Ribbon filter: practically smaller than Bloom and Xor

TLDR
The Ribbon filter is introduced: a new filter for static sets with a broad range of configurable space overheads and false positive rates with competitive speed over that range, especially for larger f ≥ 2−7.

Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters

TLDR
Xor filters can be faster than Bloom and cuckoo filters while using less memory and it is found that a more compact version of xor filters (xor+) can use even less space than highly compact alternatives (e.g., Golomb-compressed sequences) while providing speeds competitive with Bloom filters.

Performance-Optimal Filtering: Bloom overtakes Cuckoo at High-Throughput

TLDR
In high-throughput situations, the lower lookup cost of blocked Bloom filters allows them to overtake Cuckoo filters, and new filter variants are introduced, namely the register-blocked and cache-sectorized Bloom filters.

Ultra-Fast Bloom Filters using SIMD techniques

  • Jianyuan LuYing Wan B. Liu
  • Computer Science
    2017 IEEE/ACM 25th International Symposium on Quality of Service (IWQoS)
  • 2017
TLDR
This paper proposes a new Bloom filter variant called Ultra-Fast Bloom Filters, by leveraging the SIMD techniques, and makes three improvements for the UFBF to accelerate the membership check speed.

Cuckoo Filter: Practically Better Than Bloom

TLDR
Cuckoo filters support adding and removing items dynamically while achieving even higher performance than Bloom filters, and have lower space overhead than space-optimized Bloom filters.

Network Applications of Bloom Filters: A Survey

TLDR
The aim of this paper is to survey the ways in which Bloom filters have been used and modified in a variety of network problems, with the aim of providing a unified mathematical and practical framework for understanding them and stimulating their use in future applications.

Don't Thrash: How to Cache Your Hash on Flash

TLDR
Two data structures are given, the buffered quotient filter and the cascade filter, which serve as SSD-optimized alternatives to the Bloom filter and significantly outperform recently-proposed SSD- Optimized Bloom filter variants.

A Reliable Randomized Algorithm for the Closest-Pair Problem

TLDR
In the course of solving the duplicate-grouping problem, a new universal class of hash functions of independent interest is described, and it is shown that both of the foregoing problems can be solved by randomized algorithms that useO(n) space and finish inO( n) time with probability tending to 1 asngrows to infinity.

Cache-, hash-, and space-efficient bloom filters

TLDR
This work proposes several new variants of Bloom filters and replacements with similar functionality that have a better cache-efficiency and need less hash bits than regular Bloom filters, and some use SIMD functionality, while the others provide an even better space efficiency.