Spectral bloom filters

@inproceedings{Cohen2003SpectralBF,
  title={Spectral bloom filters},
  author={Saar Cohen and Y. Matias},
  booktitle={SIGMOD '03},
  year={2003}
}
A Bloom Filter is a space-efficient randomized data structure allowing membership queries over sets with certain allowable errors. It is widely used in many applications which take advantage of its ability to compactly represent a set, and filter out effectively any element that does not belong to the set, with small error probability. This paper introduces the Spectral Bloom Filter (SBF), an extension of the original Bloom Filter to multi-sets, allowing the filtering of elements whose… 

Figures and Tables from this paper

A Scalable Bloom Filter for Membership Queries
TLDR
A new design of a scalable Bloom filter (SBF) for an expanding data set that keeps a low false positive rate by adding Bloom filter vectors with double length when necessary and outperforms other current scalable Bloom filters significantly.
Adaptive Bloom filter
TLDR
The traditional Bloom filter is generalized to Adaptive Bloom Filter, which incorporates the information on the query frequencies and the membership likelihood of the elements into its optimal design, and it is shown that the adapted Bloom filter always outperforms theTraditional Bloom filter.
Weighted Bloom filter
TLDR
The traditional Bloom filter is generalized to weighted Bloom filter, which incorporates the information on the query frequencies and the membership likelihood of the elements into its optimal design, and it is shown that the adapted Bloom filter always outperforms theTraditional Bloom filter.
Cardinality Computing: A New Step Towards Fully Representing Multi-sets by Bloom Filters
TLDR
Two novel algorithms for computing cardinalities of multi-sets represented by Bloom Filters are introduced, which extend the functionality of the Bloom Filter and thus make it usable in a variety of new applications.
Theory and Network Applications of Dynamic Bloom Filters
TLDR
This paper proves that DBF can control the false positive probability at a low level by adjusting the number of standard bloom filters used according to the actual size of current dynamic set, and presents multidimension dynamic bloom filters (MDDBF) to support concise representation and approximate membership queries of dynamic sets in multiple attribute dimensions.
Optimizing data popularity conscious bloom filters
TLDR
This paper studies the problem of minimizing the false-positive probability of a Bloom filter by adapting the number of hashes used for each data object to its popularity in sets and membership queries and proposes two polynomial-time solutions with bounded approximation ratios.
i-DBF: an Improved Bloom Filter Representation Method on Dynamic Set
TLDR
It has been proved that DBF not only possess the advantage of standard bloom filter, but also has better features when dealing with dynamic set, and this improved dynamic bloom filter i-DBF has better performance both in the storage space and in the false positive probability.
The Dynamic Bloom Filters
TLDR
This work proposes dynamic Bloom filters to represent dynamic sets, as well as static sets and design necessary item insertion, membership query, item deletion, and filter union algorithms.
Suitability of a new Bloom filter for numerical vectors with high dimensions
TLDR
A new uniform Prime-HD-BKDERhash family and a new Bloom filter (P-HDBF) to retrieve the membership of a big data set with the numerical high dimensions and provides an efficient solution alternative to implement membership search with space-time overheads.
Multiple Set Matching and Pre-Filtering with Bloom Multifilters
TLDR
This article proposes two efficient Bloom Multifilters called Bloom Matrix and Bloom Vector which are space efficient and answer queries with a set of identifiers for multiple set matching problems and shows that the space efficiency can be optimized further according to the distribution of labels among multiple sets: Uniform and Zipf.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 30 REFERENCES
Compressed bloom filters
A Bloom filter is a simple space-efficient randomized data structure for representing a set in order to support membership queries. Although Bloom filters allow false positives, for many applications
Network Applications of Bloom Filters: A Survey
TLDR
The aim of this paper is to survey the ways in which Bloom filters have been used and modified in a variety of network problems, with the aim of providing a unified mathematical and practical framework for understanding them and stimulating their use in future applications.
Maintaining Stream Statistics over Sliding Windows
TLDR
The problem of maintaining aggregates and statistics over data streams, with respect to the last N data elements seen so far, is considered, and it is shown that, using $O(\frac{1}{\epsilon} \log^2 N)$ bits of memory, the number of 1's can be estimated to within a factor of $1 + \ep silon$.
Succinct Dynamic Data Structures
TLDR
P succinct data structures are developed to represent a sequence of values to support partial sum and select queries and update and a dynamic array which supports insertion, deletion and access of an element at any given index.
Space/time trade-offs in hash coding with allowable errors
TLDR
Analysis of the paradigm problem demonstrates that allowing a small number of test messages to be falsely identified as members of the given set will permit a much smaller hash area to be used without increasing reject time.
Bifocal sampling for skew-resistant join size estimation
TLDR
The estimate obtained by the bifocal sampling algorithm is proven to lie with high probability within a small constant factor of the actual join size, regardless of the skew, as long as the join size is Ω(n lg n), for relations consisting of n tuples.
Computing Iceberg Queries Efficiently
TLDR
This work proposes efficient algorithms to evaluate iceberg queries using very little memory and significantly fewer passes over data, as compared to current techniques that use sorting or hashing.
Designing a Bloom filter for differential file access
TLDR
The design process for a Bloom filter for an on-line student database is described, and it is shown that a very effective filter can be constructed with a modest expenditure of system resources.
Fixed-precision estimation of join selectivity
TLDR
A partial ordering that compares the variability of the estimators for the different procedures after an arbitrary fixed number of sampling steps leads to a new algorithm for fixed-precision estimation of the selectivity of an equijoin that appears to be the best available when there are no indices on the join key.
New sampling-based summary statistics for improving approximate query answers
TLDR
This paper introduces two new sampling-based summary statistics, concise samples and counting samples, and presents new techniques for their fast incremental maintenance regardless of the data distribution, and considers their application to providing fast approximate answers to hot list queries.
...
1
2
3
...