A Case for Partitioned Bloom Filters

@article{Almeida2020ACF,
  title={A Case for Partitioned Bloom Filters},
  author={Paulo S'ergio Almeida},
  journal={ArXiv},
  year={2020},
  volume={abs/2009.11789}
}
In a partitioned Bloom Filter the $m$ bit vector is split into $k$ disjoint $m/k$ sized parts, one per hash function. Contrary to hardware designs, where they prevail, software implementations mostly adopt standard Bloom filters, considering partitioned filters slightly worse, due to the slightly larger false positive rate (FPR). In this paper, by performing an in-depth analysis, first we show that the FPR advantage of standard Bloom filters is smaller than thought; more importantly, by… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 42 REFERENCES

Age-Partitioned Bloom Filters

Age-Partitioned Bloom Filters is presented, a BF-based approach for duplicate detection in sliding windows that not only is competitive in time-complexity, but has better space usage than current dictionary-based approaches (e.g., SWAMP), at the cost of some moderate slack.

Compressed bloom filters

A Bloom filter is a simple space-efficient randomized data structure for representing a set in order to support membership queries. Although Bloom filters allow false positives, for many applications

Cache-, hash-, and space-efficient bloom filters

This work proposes several new variants of Bloom filters and replacements with similar functionality that have a better cache-efficiency and need less hash bits than regular Bloom filters, and some use SIMD functionality, while the others provide an even better space efficiency.

Xor Filters

Xor filters can be faster than Bloom and cuckoo filters while using less memory and it is found that a more compact version of xor filters (xor+) can use even less space than highly compact alternatives (e.g., Golomb-compressed sequences) while providing speeds competitive with Bloom filters.

Ribbon filter: practically smaller than Bloom and Xor

The Ribbon filter is introduced: a new filter for static sets with a broad range of configurable space overheads and false positive rates with competitive speed over that range, especially for larger f ≥ 2−7.

Binary Fuse Filters: Fast and Smaller Than Xor Filters

This work builds probabilistic filters—called binary fuse filters—that are within 13% of the storage lower bound—without sacrificing query speed, and compares the performance against a wide range of competitive alternatives.

On the analysis of Bloom filters

  • F. Grandi
  • Computer Science, Mathematics
    Inf. Process. Lett.
  • 2018

Understanding bloom filter intersection for lazy address-set disambiguation

It is demonstrated that intersecting Bloom filters requires substantially larger bit-arrays to provide the same probability of false set-overlap as querying into the bit-array, and it is proved that partitioned Bloom filters require less space than unpartitioned.

Cardinality estimation and dynamic length adaptation for Bloom filters

This work presents a new approach to encode a Bloom filter such that its length can be adapted to the cardinality of the set it represents, with negligible overhead with respect to computation and false positive probability.

Cuckoo Filter: Practically Better Than Bloom

Cuckoo filters support adding and removing items dynamically while achieving even higher performance than Bloom filters, and have lower space overhead than space-optimized Bloom filters.