• Corpus ID: 2159505

The Bloomier filter: an efficient data structure for static support lookup tables

@inproceedings{Chazelle2004TheBF,
  title={The Bloomier filter: an efficient data structure for static support lookup tables},
  author={Bernard Chazelle and Joe Kilian and Ronitt Rubinfeld and Ayellet Tal},
  booktitle={SODA '04},
  year={2004}
}
We introduce the Bloomier filter, a data structure for compactly encoding a function with static support in order to support approximate evaluation queries. Our construction generalizes the classical Bloom filter, an ingenious hashing scheme heavily used in networks and databases, whose main attribute---space efficiency---is achieved at the expense of a tiny false-positive rate. Whereas Bloom filters can handle only set membership queries, our Bloomier filters can deal with arbitrary functions… 

Figures from this paper

A Shifting Bloom Filter Framework for Set Queries
TLDR
This paper proposes a Shifting Bloom Filter framework for representing and querying sets, and demonstrates the effectiveness of ShBF using three types of popular set queries: membership, association, and multiplicity queries.
Bloomier Filters: A second look
TLDR
This article gives a simple construction of a Bloomier filter, a space efficient structure for storing static sets, where the space efficiency is gained at the expense of a small probability of false-positives.
Counting with TinyTable: Every bit counts!
  • Gil Einziger, R. Friedman
  • Computer Science
    2015 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)
  • 2015
TLDR
TinyTable is presented, an efficient hash table based construction that supports membership queries, multiplicity queries (statistics) and removals, and is more space efficient than existing alternatives, both those derived from Bloom filters and other hashtable based schemes.
Counting With Tinytable: Every Bit Counts!
TLDR
TinyTable is presented that supports set membership, removals, and multiplicity queries, and is more compact than Bloom filters as long as the false positive ratio is less than 1%.
Incremental Bloom Filters
TLDR
This work considers the problem of minimizing the memory requirements in cases where the number of elements in the set is not known in advance but the distribution or moment information of the numberof elements is known and shows how to exploit such information to minimize the expected amount of memory required for the filter.
Optimizing data popularity conscious bloom filters
TLDR
This paper studies the problem of minimizing the false-positive probability of a Bloom filter by adapting the number of hashes used for each data object to its popularity in sets and membership queries and proposes two polynomial-time solutions with bounded approximation ratios.
Rank-indexed hashing: A compact construction of Bloom filters and variants
TLDR
A new fingerprint hash table construction called Rank-Indexed Hashing that can achieve very compact representations is proposed that can be achieved with a factor of three or more in space savings even for a false positive probability of just 1%.
Fast Bloom Filters and Their Generalization
TLDR
Bloom-1, a data structure that performs membership check in one memory access, which compares favorably with the k memory accesses of a standard Bloom filter, is studied, allowing performance tradeoff between membership query overhead and false positive ratio.
Bloom Maps
TLDR
A novel data structure, the Bloom map, is introduced, generalising the Bloom filter to this problem of succinctly encoding a static map to support approximate queries and derives upper and lower bounds on the space requirements in terms of the error rate and the entropy of the distribution of values over keys.
Achieving Perfect Hashing through an Improved Construction of Bloom Filters
TLDR
This paper proposes a scheme to extend BFs with ``indexing'' features so that when an element is queried, an univocal index of that element is returned, which in turn can be used as an address for a table, just as a perfect hashing scheme.
...
...

References

SHOWING 1-10 OF 44 REFERENCES
Compressed bloom filters
A Bloom filter is a simple space-efficient randomized data structure for representing a set in order to support membership queries. Although Bloom filters allow false positives, for many applications
Spectral bloom filters
TLDR
The Spectral Bloom Filter is introduced, an extension of the original Bloom Filter to multi-sets, allowing the filtering of elements whose multiplicities are below a threshold given at query time.
Network Applications of Bloom Filters: A Survey
TLDR
The aim of this paper is to survey the ways in which Bloom filters have been used and modified in a variety of network problems, with the aim of providing a unified mathematical and practical framework for understanding them and stimulating their use in future applications.
PERF join: an alternative to two-way semijoin and bloomjoin
TLDR
This paper presents “Positionally Encoded Record Filters” (PERFs) and describes their use in a distributed query processing technique called PERF join and demonstrates through analytical studies thatPERF join performs significantly better than two-way Bloomjoin and two- way semijoin variants under a wide range of relevant cost parameter values.
Computing Iceberg Queries Efficiently
TLDR
This work proposes efficient algorithms to evaluate iceberg queries using very little memory and significantly fewer passes over data, as compared to current techniques that use sorting or hashing.
Designing a Bloom filter for differential file access
TLDR
The design process for a Bloom filter for an on-line student database is described, and it is shown that a very effective filter can be constructed with a modest expenditure of system resources.
Optimal Semijoins for Distributed Database Systems
A Bloom-filter-based semijoin algorithm for distributed database systems is presented. This algorithm reduces communications costs to process a distributed natural join as much as possible with a
Cache Digests
Storing a sparse table with O(1) worst case access time
TLDR
A data structure for representing a set of n items from a universe of m items, which uses space n+o(n) and accommodates membership queries in constant time and is easy to implement.
Probabilistic location and routing
  • Sean C. Rhea, J. Kubiatowicz
  • Computer Science
    Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies
  • 2002
We propose probabilistic location to enhance the performance of existing peer-to-peer location mechanisms in the case where a replica for the queried data item exists close to the query source. We
...
...