Cache-, hash-, and space-efficient bloom filters

@article{Putze2010CacheHA,
  title={Cache-, hash-, and space-efficient bloom filters},
  author={Felix Putze and Peter Sanders and John Victor Singler},
  journal={ACM J. Exp. Algorithmics},
  year={2010},
  volume={14}
}
A Bloom filter is a very compact data structure that supports approximate membership queries on a set, allowing false positives. We propose several new variants of Bloom filters and replacements with similar functionality. All of them have a better cache-efficiency and need less hash bits than regular Bloom filters. Some use SIMD functionality, while the others provide an even better space efficiency. As a consequence, we get a more flexible trade-off between false-positive rate, space… 

Figures and Tables from this paper

The Power of 1 + α for Memory-Efficient Bloom Filters
TLDR
A cache-aware Bloom-filter algorithm with improved cache behavior and lower false-positive rates compared to prior work is presented, which relies on the power-of-two choice principle to provide a better distribution of set elements in a blocked Bloom filter.
Cache Efficient Bloom Filters for Shared Memory Machines
TLDR
This paper implements a cache-efficient blocked bloom filter that performs insertions and lookups while only accessing a small block of memory, and improves upon the implementation described by [4] by adapting dynamically to unbalanced assignment of elements to memory blocks.
D-Ary Cuckoo Filter: A Space Efficient Data Structure for Set Membership Lookup
TLDR
D-ary Cuckoo filters can save up to one bit for each element at the cost of increased lookup and insertion performance, and the base-d digitwise xor operations as the foundation for computing the d candidate buckets of each element in a cyclic fashion.
Prefix Filter: Practically and Theoretically Better Than Bloom
TLDR
The prefix filter is proposed, an incremental filter that addresses the above challenge and its space (in bits) is similar to state-of-the-art dynamic filters; its query throughput is high and is comparable to that of the cuckoo filter; and overall build times faster than those of the vector quotient filter and cuckoff filter.
Low Computational Cost Bloom Filters
TLDR
This paper introduces a low computational cost Bloom filter named One-Hashing Bloom filter (OHBF), which requires only one base hash function plus a few simple modulo operations to implement a Bloom filter and significantly reduces the computational overhead of the hash functions.
TinySet - An Access Efficient Self Adjusting Bloom Filter Construction
  • Gil Einziger, R. Friedman
  • Computer Science
    2015 24th International Conference on Computer Communication and Networks (ICCCN)
  • 2015
TLDR
TinySet is presented, an alternative Bloom filter construction that is more space efficient than Bloom filters for false positive rates smaller than 2.8%, accesses only a single memory word and partially supports removals.
TinySet—An Access Efficient Self Adjusting Bloom Filter Construction
TLDR
Tiny set is presented, an alternative Bloom filter construction that is more space efficient than Bloom filters for false positive rates smaller than 2.8%, accesses only a single memory word and partially supports removals.
Access-efficient Balanced Bloom Filters
Vacuum Filters: More Space-Efficient and Faster Replacement for Bloom and Cuckoo Filters
TLDR
This work presents vacuum filters, a type of data structures to support approximate membership queries and proposes a new update framework to resolve two difficult problems for AMQ structures under dynamics, namely duplicate insertions and set resizing.
Cuckoo Filter: Practically Better Than Bloom
TLDR
Cuckoo filters support adding and removing items dynamically while achieving even higher performance than Bloom filters, and have lower space overhead than space-optimized Bloom filters.
...
...

References

SHOWING 1-10 OF 26 REFERENCES
Compressed bloom filters
A Bloom filter is a simple space-efficient randomized data structure for representing a set in order to support membership queries. Although Bloom filters allow false positives, for many applications
Network Applications of Bloom Filters: A Survey
TLDR
The aim of this paper is to survey the ways in which Bloom filters have been used and modified in a variety of network problems, with the aim of providing a unified mathematical and practical framework for understanding them and stimulating their use in future applications.
Bloom Filters in Probabilistic Verification
TLDR
This paper shows how to obtain Bloom filters that are simultaneously fast, accurate, memory-efficient, scalable, and flexible, and presents a mathematical analysis of Bloom filters in verification in unprecedented detail, which enables a fresh comparison between hash compaction and Bloom filters.
Using the Power of Two Choices to Improve Bloom Filters
TLDR
It is shown via simulations that, in comparison with a standard Bloom filter, using the power of two choices can yield modest reductions in the false positive probability using the same amount of space and more hashing.
Summary cache: a scalable wide-area web cache sharing protocol
TLDR
This paper demonstrates the benefits of cache sharing, measures the overhead of the existing protocols, and proposes a new protocol called "summary cache", which reduces the number of intercache protocol messages, reduces the bandwidth consumption, and eliminates 30% to 95% of the protocol CPU overhead, all while maintaining almost the same cache hit ratios as ICP.
An optimal Bloom filter replacement
TLDR
A new RAM data structure is considered for storing an approximation of S to S such that S ⊆ S and any element not in S belongs to S with probability at most ∈, and the space usage is within a lower order term of the lower bound.
Space/time trade-offs in hash coding with allowable errors
TLDR
Analysis of the paradigm problem demonstrates that allowing a small number of test messages to be falsely identified as members of the given set will permit a much smaller hash area to be used without increasing reject time.
Fast and Accurate Bitstate Verification for SPIN
TLDR
This work presents efficient ways of computing multiple hash values that, despite sacrificing independence, give virtually the same accuracy and even yield a speed improvement in the two hash function case when compared to the current SPIN implementation.
Intersection in Integer Inverted Indices
TLDR
The previous theoretical approaches, methods used in practice, and one new algorithm which exploits that the intersection uses small integer keys are compared, which is the only algorithm that performs well over the entire spectrum of relative list length ratios.
An Algorithm for Approximate Membership checking with Application to Password Security
...
...