RAMBO: Repeated And Merged Bloom Filter for Multiple Set Membership Testing (MSMT) in Sub-linear time
@article{Gupta2019RAMBORA, title={RAMBO: Repeated And Merged Bloom Filter for Multiple Set Membership Testing (MSMT) in Sub-linear time}, author={Gaurav Gupta and Benjamin Coleman and Tharun Medini and Vijai Mohan and Anshumali Shrivastava}, journal={ArXiv}, year={2019}, volume={abs/1910.02611} }
Approximate set membership is a common problem with wide applications in databases, networking, and search. Given a set S and a query q, the task is to determine whether q in S. The Bloom Filter (BF) is a popular data structure for approximate membership testing due to its simplicity. In particular, a BF consists of a bit array that can be incrementally updated. A related problem concerning this paper is the Multiple Set Membership Testing (MSMT) problem. Here we are given K different sets, and…
4 Citations
Building Fast and Compact Sketches for Approximately Multi-Set Multi-Membership Querying
- Computer ScienceSIGMOD Conference
- 2021
A novel Circular Shift and Coalesce (CSC) framework is proposed to efficiently achieve approximate MS-MMQ, which encodes all n sets into a compact sketch and retrieves only a few bytes in the sketch for a query, which achieves high memory-efficiency and boosts the query speed by several times.
Raptor: A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences
- Computer SciencebioRxiv
- 2020
To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics
- Computer ScienceNucleic acids research
- 2020
The fundamentals of the most impactful probabilistic and signal processing algorithms are reviewed and more recent advances are highlighted to augment previous reviews in these areas that have taken a broader approach.
Sub-linear Sequence Search via a Repeated And Merged Bloom Filter (RAMBO)
- Computer Science
- 2019
RAMBO (Repeated and Merged Bloom Filter) is proposed where the number of Bloom filter probes is significantly less than BigSI due to sub-linear scaling for the same false-positive rate and provides a significant improvement over BigSI in terms of query time when evaluated on real genome datasets.
References
SHOWING 1-10 OF 18 REFERENCES
Ultra-fast search of all deposited bacterial and viral genomic data
- Computer Science, BiologyNature Biotechnology
- 2019
This work indexed the entire global corpus of 447,833 bacterial and viral whole-genome sequence datasets using four orders of magnitude less storage than previous methods and produced a searchable data structure named BItsliced Genomic Signature Index (BIGSI).
Extreme Classification in Log Memory
- Computer ScienceArXiv
- 2018
MACH is a generic K-classification algorithm, with provably theoretical guarantees, which requires O(log K) memory without any assumption on the relationship between classes, and provides theoretical quantification of discriminability-memory tradeoff.
OMASS: One Memory Access Set Separation
- Computer ScienceIEEE Transactions on Knowledge and Data Engineering
- 2016
The One Memory Access Set Separation (OMASS) scheme is designed so that for a given element x, the corresponding Bloom filter bits for each set map to different positions in the memory word, which ensures that the false positive rates for the Bloom filters for element x under other sets are not affected.
One Sketch to Rule Them All: Rethinking Network Flow Monitoring with UnivMon
- Computer ScienceSIGCOMM
- 2016
UnivMon is presented, a framework for flow monitoring which leverages recent theoretical advances and demonstrates that it is possible to achieve both generality and high accuracy, and evaluated using a range of trace-driven evaluations to show that it offers comparable (and sometimes better) accuracy relative to custom sketching solutions.
Exact and approximate membership testers
- MathematicsSTOC
- 1978
The question of how much space is needed to represent a set is considered, given a finite universe U and some subset V and a procedure that for each element s in U determines if s is in V.
Beating CountSketch for heavy hitters in insertion streams
- Computer ScienceSTOC
- 2016
One can achieve O(logn loglogn) bits of space for the problem of returning all ℓ2-heavy hitters, i.e., those items j for which fj ≥ є √F2, where fj is the number of occurrences of item j in the stream, and F2 = ∑i ∈ [n] fi2.
Finding Frequent Items in Data Streams
- Computer ScienceICALP
- 2002
This work presents a 1-pass algorithm for estimating the most frequent items in a data stream using limited storage space, which achieves better space bounds than the previously known best algorithms for this problem for several natural distributions on the item frequencies.
Compressed bloom filters
- Computer SciencePODC '01
- 2001
A Bloom filter is a simple space-efficient randomized data structure for representing a set in order to support membership queries. Although Bloom filters allow false positives, for many applications…
Managing Gigabytes: Compressing and Indexing Documents and Images
- Computer Science
- 1994
A guide to the MG system and its applications, as well as a comparison to the NZDL reference index, are provided.