Xor Filters

@article{Graf2020XorF,
  title={Xor Filters},
  author={Thomas Mueller Graf and Daniel Lemire},
  journal={Journal of Experimental Algorithmics (JEA)},
  year={2020},
  volume={25},
  pages={1 - 16}
}
The Bloom filter provides fast approximate set membership while using little memory. Engineers often use these filters to avoid slow operations such as disk or network accesses. As an alternative, a cuckoo filter may need less space than a Bloom filter and it is faster. Chazelle et al. proposed a generalization of the Bloom filter called the Bloomier filter. Dietzfelbinger and Pagh described a variation on the Bloomier filter that can answer approximate membership queries over immutable sets… 

Figures and Tables from this paper

Optimizing Cuckoo Filter for high burst tolerance, low latency, and high throughput
TLDR
This paper presents an implementation of a cuckoo filter for membership testing, optimized for distributed data stores operating in high workloads, that gives better amortized times for search, with less false positives.
Cuckoo index
TLDR
Cuckoo Index is introduced, an approximate secondary index structure that represents the many-to-many relationship between keys and data partitions in a highly space-efficient way that targets equality predicates in a read-only (immutable) setting and optimize for space efficiency under the premise of practical build and lookup performance.
BionetBF: A Novel Bloom Filter for Faster Membership Identification of Paired Biological Network Data
TLDR
A novel Bloom Filter for biological networks, called BionetBF, is proposed, to provide fast membership identification of the biological network edges or paired biological data, capable of executing millions of operations within a second on datasets having millions of paired biologicalData while occupying tiny amount of main memory.
PTHash: Revisiting FCH Minimal Perfect Hashing
TLDR
An improved algorithm is presented that scales well to large sets and reduces space consumption altogether, without compromising the lookup time and finds functions that are competitive in space with state-of-the art techniques and provide 2-4x better lookup time.
Peeling Close to the Orientability Threshold: Spatial Coupling in Hashing-Based Data Structures
TLDR
A new family of random $k-uniform fully random hypergraphs with i.i.d. random hyperedges is constructed, exploiting the phenomenon of threshold saturation via spatial coupling discovered in the context of low density parity check codes.
Security Analysis of Machine Learning-Based PUF Enrollment Protocols: A Review
TLDR
This paper identifies two architectures of enrollment protocols based on the participating entities and the building blocks that are relevant to the security of the authentication procedure and provides design guidelines for future enrollment protocol designers.
Birdwatching
Enabling Privacy-Aware Zone Exchanges Among Authoritative and Recursive DNS Servers
TLDR
The approach enables mapping of large DNS zones, while preserving privacy, and leverages on the space, time and privacy-enhancing properties of Cuckoo Filters to map zone names in an efficient manner, while permitting rapid name updates for large zones.

References

SHOWING 1-10 OF 48 REFERENCES
Cuckoo Filter: Practically Better Than Bloom
TLDR
Cuckoo filters support adding and removing items dynamically while achieving even higher performance than Bloom filters, and have lower space overhead than space-optimized Bloom filters.
The Variable-Increment Counting Bloom Filter
TLDR
A new general method based on variable increments to improve the efficiency of CBFs and their variants and can extend many variants of CBF that have been published in the literature.
Cache-, hash-, and space-efficient bloom filters
TLDR
This work proposes several new variants of Bloom filters and replacements with similar functionality that have a better cache-efficiency and need less hash bits than regular Bloom filters, and some use SIMD functionality, while the others provide an even better space efficiency.
XOR-Satisfiability Set Membership Filters
TLDR
Experimental results show that this new XOR-Satisfiability filter can be more than \(99\%\) efficient (i.e., achieve the information-theoretic limit) while also having a query speed comparable to the standard Bloom filter, making it practical for use with very large data sets.
Morton Filters: Faster, Space-Efficient Cuckoo Filters via Biasing, Compression, and Decoupled Logical Sparsity
TLDR
This work introduces the Morton filter (MF), a novel AS-MDS that introduces several key improvements to CFs, and typically uses comparable to slightly less space than CFs for the same epsis.
A General-Purpose Counting Filter: Making Every Bit Count
TLDR
A new general-purpose AMQ, the counting quotient filter (CQF), which is small and fast, has good locality of reference, scales out of RAM to SSD, and supports deletions, counting, resizing, merging, and highly concurrent access.
Performance-Optimal Filtering: Bloom overtakes Cuckoo at High-Throughput
TLDR
In high-throughput situations, the lower lookup cost of blocked Bloom filters allows them to overtake Cuckoo filters, and new filter variants are introduced, namely the register-blocked and cache-sectorized Bloom filters.
An Optimal Bloom Filter Replacement Based on Matrix Solving
TLDR
This work suggests a method for holding a dictionary data structure, which maps keys to values, in the spirit of Bloom Filters, and suggests a data structure that requires only nk bits space, has O (n) preprocessing time, and has a O (logn ) query time.
Approximately detecting duplicates for streaming data using stable bloom filters
TLDR
This work introduces a data structure, Stable Bloom Filter, and a novel and simple algorithm, and shows that a tight upper bound of false positive rates is guaranteed and compares SBF to alternative methods.
The Bloomier filter: an efficient data structure for static support lookup tables
TLDR
The Bloomier filter is introduced, a data structure for compactly encoding a function with static support in order to support approximate evaluation queries and lower bounds are provided to prove the (near) optimality of the constructions.
...
1
2
3
4
5
...