Binary Fuse Filters: Fast and Smaller Than Xor Filters

  title={Binary Fuse Filters: Fast and Smaller Than Xor Filters},
  author={Thomas Mueller Graf and Daniel Lemire},
  journal={ACM J. Exp. Algorithmics},
Bloom and cuckoo filters provide fast approximate set membership while using little memory. Engineers use them to avoid expensive disk and network accesses. The recently introduced xor filters can be faster and smaller than Bloom and cuckoo filters. The xor filters are within 23% of the theoretical lower bound in storage as opposed to 44% for Bloom filters. Inspired by Dietzfelbinger and Walzer, we build probabilistic filters—called binary fuse filters —that are within 13% of the storage… 
1 Citations

Figures and Tables from this paper

Fast Succinct Retrieval and Approximate Membership using Ribbon

B bumped ribbon retrieval (BuRR) is presented, which achieves space overheads well below 1 % while being faster than most previously used retrieval data structures and faster than classical Bloom filters (with space overhead ≥ 44 %).



Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters

Xor filters can be faster than Bloom and cuckoo filters while using less memory and it is found that a more compact version of xor filters (xor+) can use even less space than highly compact alternatives (e.g., Golomb-compressed sequences) while providing speeds competitive with Bloom filters.

Cuckoo Filter: Practically Better Than Bloom

Cuckoo filters support adding and removing items dynamically while achieving even higher performance than Bloom filters, and have lower space overhead than space-optimized Bloom filters.

Ribbon filter: practically smaller than Bloom and Xor

The Ribbon filter is introduced: a new filter for static sets with a broad range of configurable space overheads and false positive rates with competitive speed over that range, especially for larger f ≥ 2−7.

Cache-, hash-, and space-efficient bloom filters

This work proposes several new variants of Bloom filters and replacements with similar functionality that have a better cache-efficiency and need less hash bits than regular Bloom filters, and some use SIMD functionality, while the others provide an even better space efficiency.

Morton Filters: Faster, Space-Efficient Cuckoo Filters via Biasing, Compression, and Decoupled Logical Sparsity

This work introduces the Morton filter (MF), a novel AS-MDS that introduces several key improvements to CFs, and typically uses comparable to slightly less space than CFs for the same epsis.

Vector Quotient Filters: Overcoming the Time/Space Trade-Off in Filter Design

The vector quotient filter is presented, which is based on Robin Hood hashing, but uses power-of-two-choices hashing to reduce the variance of runs, and thus offers consistent, high throughput across load factors.

Performance-Optimal Filtering: Bloom overtakes Cuckoo at High-Throughput

In high-throughput situations, the lower lookup cost of blocked Bloom filters allows them to overtake Cuckoo filters, and new filter variants are introduced, namely the register-blocked and cache-sectorized Bloom filters.

A General-Purpose Counting Filter: Making Every Bit Count

A new general-purpose AMQ, the counting quotient filter (CQF), which is small and fast, has good locality of reference, scales out of RAM to SSD, and supports deletions, counting, resizing, merging, and highly concurrent access.

The Bloomier filter: an efficient data structure for static support lookup tables

The Bloomier filter is introduced, a data structure for compactly encoding a function with static support in order to support approximate evaluation queries and lower bounds are provided to prove the (near) optimality of the constructions.

A Lower Bound for Dynamic Approximate Membership Data Structures

  • Shachar LovettE. Porat
  • Computer Science
    2010 IEEE 51st Annual Symposium on Foundations of Computer Science
  • 2010
A new lower bound for the memory requirements of any dynamic approximate membership data structure is shown, which shows that the entropy lower bound cannot be achieved by dynamic data structures for any constant error rate.