Don't Thrash: How to Cache Your Hash on Flash

@article{Bender2011DontTH,
  title={Don't Thrash: How to Cache Your Hash on Flash},
  author={Michael A. Bender and Martin Farach-Colton and Rob Johnson and Bradley C. Kuszmaul and Dzejla Medjedovic and Pablo Montes and Pradeepa Anand Shetty and Richard P. Spillane and Erez Zadok},
  journal={Proc. VLDB Endow.},
  year={2011},
  volume={5},
  pages={1627-1637}
}
This paper presents new alternatives to the well-known Bloom filter data structure. The Bloom filter, a compact data structure supporting set insertion and membership queries, has found wide application in databases, storage systems, and networks. Because the Bloom filter performs frequent random reads and writes, it is used almost exclusively in RAM, limiting the size of the sets it can represent. This paper first describes the quotient filter, which supports the basic operations of the… 
Concurrent Expandable AMQs on the Basis of Quotient Filters
TLDR
This work proposes a new locking scheme that has no memory overhead that is aimed to reduce the number of status bits (2-status-bit variant) or to simplify concurrent implementations (linear probing quotient filter).
Chucky: A Succinct Cuckoo Filter for LSM-Tree
TLDR
This work proposes Chucky, a new design that replaces the multiple Bloom filters by a single Cuckoo filter that maps each data entry to an auxiliary address of its location within the LSM-tree, and achieves the best of both worlds: a modest access cost and a low false positive rate at the same time.
TBF: A High-Efficient Query Mechanism in De-duplication Backup System
TLDR
Two-stage Bloom Filter mechanism is proposed, which decreases the number of accessing disks, improves the speed of detecting the redundant data chunks, and reduces the rate of false positive.
A General-Purpose Counting Filter: Making Every Bit Count
TLDR
A new general-purpose AMQ, the counting quotient filter (CQF), which is small and fast, has good locality of reference, scales out of RAM to SSD, and supports deletions, counting, resizing, merging, and highly concurrent access.
Vacuum Filters: More Space-Efficient and Faster Replacement for Bloom and Cuckoo Filters
TLDR
This work presents vacuum filters, a type of data structures to support approximate membership queries and proposes a new update framework to resolve two difficult problems for AMQ structures under dynamics, namely duplicate insertions and set resizing.
Optimal Bloom Filters and Adaptive Merging for LSM-Trees
TLDR
This article presents Monkey, an LSM-tree based key-value store that strikes the optimal balance between the costs of updates and lookups with any given main memory budget, and maps the design space onto a closed-form model that enables adapting the merging frequency and memory allocation to strike the best tradeoff among lookup cost, update cost and main memory.
A four-dimensional Analysis of Partitioned Approximate Filters
TLDR
While register-blocked Bloom filters offer the highest throughput, the new Xor filters are best suited when optimizing for small filter sizes or low false-positive rates.
Quotient Filters: Approximate Membership Queries on the GPU
TLDR
This paper describes the GPU implementation of two types of quotient filters: the standard quotient filter and the rank-and-select-based quotients filter, and describes the parallelization of all filter operations, including a comparison of the four different methods the authors devised for parallelizing quotientfilter construction.
Support Optimality and Adaptive Cuckoo Filters
TLDR
A new Adaptive Cuckoo Filter is designed, and it is shown to be support optimal over any n queries when storing a set of size n, and to be the first practical data structure that is support optimal, and the first support optimal filter that does not require additional space beyond a normal cuckoo filter.
Fluid Co-processing: GPU Bloom-filters for CPU Joins
TLDR
Early results show that fluid co-processing consistently improves end-to-end CPU performance of early pruning in join queries thanks to the GPU, by factors up to 2-3x.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 37 REFERENCES
Buffered Bloom Filters on Solid State Storage
TLDR
This paper is proposing a technique to reduce the memory requirement for Bloom filters with the help of solid state storage devices (SSD), and shows that with significantly less memory requirement and fewer hash functions the proposed technique reduces the false positive rate effectively.
BloomFlash: Bloom Filter on Flash-Based Storage
TLDR
BLOOMFLASH is a bloom filter designed for flash memory based storage that provides a new dimension of trade off with bloom filter access times to reduce RAM space usage (and hence system cost) and is advocated that flash memory may serve as a suitable medium for storing bloom filters.
A Forest-structured Bloom Filter with flash memory
TLDR
This paper proposes a Forest-structured BF design, which uses a combination of RAM and flash memory to design a BF, and achieves 2 times faster processing speed with 50% less number of flash write operations when compared with the existing flash memory based BF designs.
Avoiding the Disk Bottleneck in the Data Domain Deduplication File System
TLDR
Three techniques employed in the production Data Domain deduplication file system to relieve the disk bottleneck are described, which enable a modern two-socket dual-core system to run at 90% CPU utilization with only one shelf of 15 disks and achieve 100 MB/sec for single-stream throughput and 210 MB/ sec for multi- stream throughput.
Network Applications of Bloom Filters: A Survey
TLDR
The aim of this paper is to survey the ways in which Bloom filters have been used and modified in a variety of network problems, with the aim of providing a unified mathematical and practical framework for understanding them and stimulating their use in future applications.
Compact Hash Tables Using Bidirectional Linear Probing
  • Clerry
  • Computer Science
    IEEE Transactions on Computers
  • 1984
An algorithm is developed which reduces the memory requirements of hash tables. This is achieved by storing only a part of each key along with a few extra bits needed to ensure that all keys are
The input/output complexity of sorting and related problems
TLDR
Tight upper and lower bounds are provided for the number of inputs and outputs (I/OS) between internal memory and secondary storage required for five sorting-related problems: sorting, the fast Fourier transform (FFT), permutation networks, permuting, and matrix transposition.
An Improved Construction for Counting Bloom Filters
TLDR
A simple hashing-based alternative based on d- left hashing called a d-left CBF (dlCBF), which offers the same functionality as a CBF, but uses less space, generally saving a factor of two or more.
Summary cache: a scalable wide-area web cache sharing protocol
TLDR
This paper demonstrates the benefits of cache sharing, measures the overhead of the existing protocols, and proposes a new protocol called "summary cache", which reduces the number of intercache protocol messages, reduces the bandwidth consumption, and eliminates 30% to 95% of the protocol CPU overhead, all while maintaining almost the same cache hit ratios as ICP.
LazyBase: freshness vs. performance in information management
TLDR
Initial results with LazyBase illustrate the feasibility of the pipelined model, highlight a rich space of trade-offs between result freshness and query performance, and often outperform existing solutions in the space.
...
1
2
3
4
...