• Corpus ID: 5507317

An optimal Bloom filter replacement

@inproceedings{Pagh2005AnOB,
  title={An optimal Bloom filter replacement},
  author={Anna Pagh and R. Pagh and S. Srinivasa Rao},
  booktitle={SODA '05},
  year={2005}
}
This paper considers space-efficient data structures for storing an approximation <i>S'</i> to a set <i>S</i> such that <i>S</i> ⊆ <i>S'</i> and any element not in <i>S</i> belongs to <i>S'</i> with probability at most ∈. The <i>Bloom filter</i> data structure, solving this problem, has found widespread use. Our main result is a new RAM data structure that improves Bloom filters in several ways:• The time for looking up an element in <i>S'</i> is <i>O</i>(1), <i>independent of ∈.</i>• The space… 

Figures from this paper

Tight Bounds for Sliding Bloom Filters
TLDR
This work considers a Sliding Bloom Filter: a data structure that, given a stream of elements, supports membership queries of the set of the last n elements (a sliding window), while allowing a small error probability and a slackness parameter.
MINIMAL PERFECT HASHING AND BLOOM FILTERS MADE PRACTICAL
TLDR
A practical implementation of a theoretical result that provides the same functionality of a Bloom filter for static sets and uses a near-optimal space data structure based on recent results on perfect hashing by Botelho et al. (2007).
An Optimal Bloom Filter Replacement Based on Matrix Solving
TLDR
This work suggests a method for holding a dictionary data structure, which maps keys to values, in the spirit of Bloom Filters, and suggests a data structure that requires only nk bits space, has O (n) preprocessing time, and has a O (logn ) query time.
Support Optimality and Adaptive Cuckoo Filters
TLDR
A new Adaptive Cuckoo Filter is designed, and it is shown to be support optimal over any n queries when storing a set of size n, and to be the first practical data structure that is support optimal, and the first support optimal filter that does not require additional space beyond a normal cuckoo filter.
A Dynamic Space-Efficient Filter with Constant Time Operations
TLDR
This work presents the first space-efficient dynamic filter with constant time operations in the worst case and employs the classic reduction of Carter et al. (STOC 1978) on a new type of dictionary construction that supports random multisets.
A Lower Bound for Dynamic Approximate Membership Data Structures
  • Shachar LovettE. Porat
  • Computer Science
    2010 IEEE 51st Annual Symposium on Foundations of Computer Science
  • 2010
TLDR
A new lower bound for the memory requirements of any dynamic approximate membership data structure is shown, which shows that the entropy lower bound cannot be achieved by dynamic data structures for any constant error rate.
Ribbon filter: practically smaller than Bloom and Xor
TLDR
The Ribbon filter is introduced: a new filter for static sets with a broad range of configurable space overheads and false positive rates with competitive speed over that range, especially for larger f ≥ 2−7.
Persistent Bloom Filter: Membership Testing for the Entire History
TLDR
Persistent bloom filter is designed, a novel data structure for temporal membership testing with compact space, and it is shown that this is fairly expensive.
Fast Bloom Filters and Their Generalization
TLDR
Bloom-1, a data structure that performs membership check in one memory access, which compares favorably with the k memory accesses of a standard Bloom filter, is studied, allowing performance tradeoff between membership query overhead and false positive ratio.
Bloom maps for big data
TLDR
A lower bound on the space required per key is given in terms of the entropy of the distribution over values and the error rate and a generalization of the Bloom filter, the Bloom map, is presented that achieves the lower bound up to a small constant factor.
...
...

References

SHOWING 1-10 OF 23 REFERENCES
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets
TLDR
A structure that supports both operations in <i>O</i>(1) time on the RAM model and an information-theoretically optimal representation for cardinal cardinal trees and multisets where (appropriate generalisations of) the select and rank operations can be supported in 1) time.
Spectral bloom filters
TLDR
The Spectral Bloom Filter is introduced, an extension of the original Bloom Filter to multi-sets, allowing the filtering of elements whose multiplicities are below a threshold given at query time.
Compressed bloom filters
A Bloom filter is a simple space-efficient randomized data structure for representing a set in order to support membership queries. Although Bloom filters allow false positives, for many applications
Succinct Dynamic Dictionaries and Trees
TLDR
It is shown that a binary tree on n nodes, where each node has b = O(lg n)-bit data stored at it, can be maintained under node insertions while supporting navigation in O(1) time and updates in O((lg lg n)1+Ɛ) amortised time, for any constant Ɛ > 0.
The Bloomier filter: an efficient data structure for static support lookup tables
TLDR
The Bloomier filter is introduced, a data structure for compactly encoding a function with static support in order to support approximate evaluation queries and lower bounds are provided to prove the (near) optimality of the constructions.
Network Applications of Bloom Filters: A Survey
TLDR
The aim of this paper is to survey the ways in which Bloom filters have been used and modified in a variety of network problems, with the aim of providing a unified mathematical and practical framework for understanding them and stimulating their use in future applications.
On dynamic range reporting in one dimension
TLDR
This work considers the problem of maintaining a dynamic set of integers and answering queries of the form: report a point (equivalently, all points) in a given interval and develops the first scheme for dynamic perfect hashing requiring sublinear space.
Lossy Dictionaries
TLDR
This paper considers lossy dictionaries that are also allowed to have "false negatives", and aims to maximize the weight of included keys within a given space constraint, making almost optimal use of memory.
Membership in Constant Time and Almost-Minimum Space
TLDR
A data structure is introduced to represent a subset of elements of $\mathcal{M}$ in a number of bits close to the information-theoretic minimum, $B = \left\lceil \lg {M\choose N} \right\rceil$, and use the structure to answer membership queries in constant time.
Space/time trade-offs in hash coding with allowable errors
TLDR
Analysis of the paradigm problem demonstrates that allowing a small number of test messages to be falsely identified as members of the given set will permit a much smaller hash area to be used without increasing reject time.
...
...