Succinct Data Structures for Retrieval and Approximate Membership

@inproceedings{Dietzfelbinger2008SuccinctDS,
  title={Succinct Data Structures for Retrieval and Approximate Membership},
  author={Martin Dietzfelbinger and R. Pagh},
  booktitle={ICALP},
  year={2008}
}
The retrieval problemis the problem of associatingdata with keys in a set. Formally, the data structure must store afunction $f\colon U\to \{0,1\}^r$ that has specified values on theelements of a given set S⊆ U, |S|= n, but may have any value on elements outsideS. All known methods (e. g. those based on perfect hashfunctions), induce a space overhead of θ(n)bits over the optimum, regardless of the evaluation time. We showthat for any k, query time O(k) can beachieved using space that is within… 
Fast Succinct Retrieval and Approximate Membership using Ribbon
TLDR
B bumped ribbon retrieval (BuRR) is presented, the first practical succinct retrieval data structure, which achieves space overheads well below 1 % while being faster than most previously used retrieval data structures (typically with spaceOverheads at least an order of magnitude larger) and faster than classical Bloom filters (with space overhead ≥ 44 %).
Conjunctive Filter: Breaking the Entropy Barrier
TLDR
The objective is to break this entropy bound and construct more space-efficient data structures and show that many problems can be solved by using a conjunctive filter such as full-text search and database join queries.
A Space Lower Bound for Dynamic Approximate Membership Data Structures
An approximate membership data structure is a randomized data structure representing a set which supports membership queries. It allows for a small false positive error rate but has no false negative
A Dynamic Space-Efficient Filter with Constant Time Operations
TLDR
This work presents the first space-efficient dynamic filter with constant time operations in the worst case and employs the classic reduction of Carter et al. (STOC 1978) on a new type of dictionary construction that supports random multisets.
An Optimal Bloom Filter Replacement Based on Matrix Solving
TLDR
This work suggests a method for holding a dictionary data structure, which maps keys to values, in the spirit of Bloom Filters, and suggests a data structure that requires only nk bits space, has O (n) preprocessing time, and has a O (logn ) query time.
An extendable data structure for incremental stable perfect hashing
TLDR
This paper presents, as an application, a cyclic sequence of reductions between data structures that lead to the following bootstrapping result: a hash table design that does not need to move elements as its size grows.
A Lower Bound for Dynamic Approximate Membership Data Structures
  • Shachar Lovett, E. Porat
  • Computer Science
    2010 IEEE 51st Annual Symposium on Foundations of Computer Science
  • 2010
TLDR
A new lower bound for the memory requirements of any dynamic approximate membership data structure is shown, which shows that the entropy lower bound cannot be achieved by dynamic data structures for any constant error rate.
Constant-Time Retrieval with O(log m) Extra Bits
TLDR
This paper presents a method for treating the retrieval problem with overhead ε = O((logm)/m), which corresponds to O(1) extra memory words (O(logm) bits), and an extremely simple, constant-time query operation.
Random hypergraphs for hashing-based data structures
TLDR
This thesis examines how hyperedge distribution and load affects the probabilities with which these properties hold and derive corresponding thresholds, and identifies a hashing scheme that leads to a particularly high threshold value in this regard.
Experimental Variations of a Theoretically Good Retrieval Data Structure
TLDR
The practicability of one such theoretically very good proposal that has linear construction time, constant evaluation time and space consumption O(nr) bits is explored, bridging a gap between theory and real data structures.
...
...

References

SHOWING 1-10 OF 54 REFERENCES
An Optimal Bloom Filter Replacement Based on Matrix Solving
TLDR
This work suggests a method for holding a dictionary data structure, which maps keys to values, in the spirit of Bloom Filters, and suggests a data structure that requires only nk bits space, has O (n) preprocessing time, and has a O (logn ) query time.
Static Dictionaries Supporting Rank
TLDR
A static dictionary is a data structure for storing a subset S of a finite universe U so that membership queries can be answered efficiently and the rank of an element if found is found if found.
Efficient Minimal Perfect Hashing in Nearly Minimal Space
TLDR
A simple randomized scheme that uses n log e+log log u+o(n+loglog u) bits and has constant evaluation time and O(n + log log u) expected construction time is presented.
On dynamic range reporting in one dimension
TLDR
This work considers the problem of maintaining a dynamic set of integers and answering queries of the form: report a point (equivalently, all points) in a given interval and develops the first scheme for dynamic perfect hashing requiring sublinear space.
The Bloomier filter: an efficient data structure for static support lookup tables
TLDR
The Bloomier filter is introduced, a data structure for compactly encoding a function with static support in order to support approximate evaluation queries and lower bounds are provided to prove the (near) optimality of the constructions.
LOW REDUNDANCY IN STATIC DICTIONARIES WITH CONSTANT QUERY TIME
TLDR
It is shown that on a unit cost RAM with word size Θ(log |U |), a static dictionary for n-element sets with constant worst case query time can be obtained using B+O(log log |U|)+o(n) bits of storage, where B e is the minimum number of bits needed to represent all nelement subsets of U.
Space Efficient Hash Tables with Worst Case Constant Access Time
TLDR
This is the first dictionary that has worst case constant access time and expected constant update time, works with (1 + ε)n space, and supports satellite information.
Balanced Allocation and Dictionaries with Tightly Packed Constant Size Bins
TLDR
It is shown that e> (2/e)d−−1 is sufficient to guarantee that with high probability each ball can be put into one of the two bins assigned to it, without any bin overflowing.
Compressed bloom filters
A Bloom filter is a simple space-efficient randomized data structure for representing a set in order to support membership queries. Although Bloom filters allow false positives, for many applications
Efficient hashing with lookups in two memory accesses
TLDR
This work presents a simple, practical hashing scheme that maintains a maximum load of 2, with high probability, while achieving high memory utilization, and analyzes the trade-off between the number of moves performed during inserts and the maximum load on a bucket.
...
...