More Analysis of Double Hashing for Balanced Allocations

@article{Mitzenmacher2016MoreAO,
  title={More Analysis of Double Hashing for Balanced Allocations},
  author={Michael Mitzenmacher},
  journal={ArXiv},
  year={2016},
  volume={abs/1503.00658}
}
With double hashing, for a key $x$, one generates two hash values $f(x)$ and $g(x)$, and then uses combinations $(f(x) +i g(x)) \bmod n$ for $i=0,1,2,...$ to generate multiple hash values in the range $[0,n-1]$ from the initial two. For balanced allocations, keys are hashed into a hash table where each bucket can hold multiple keys, and each key is placed in the least loaded of $d$ choices. It has been shown previously that asymptotically the performance of double hashing and fully random… 
Load Thresholds for Cuckoo Hashing with Double Hashing
TLDR
It is shown that the load threshold for k-ary cuckoo hashing is the same when using double hashing as when using fully random hashing, and a combinatorial argument is provided that reconciles this stubborn difference.
Random hypergraphs for hashing-based data structures
TLDR
This thesis examines how hyperedge distribution and load affects the probabilities with which these properties hold and derive corresponding thresholds, and identifies a hashing scheme that leads to a particularly high threshold value in this regard.
A New Approach to Analyzing Robin Hood Hashing
TLDR
It is shown that a simple but apparently unstudied approach for handling deletions with Robin Hood hashing offers good performance even under high loads.
Arithmetic Progression Hypergraphs: Examining the Second Moment Method
TLDR
A novel "quasi-random" hypergraph model is defined, random arithmetic progression (AP) hypergraphs, which is based on edges that form arithmetic progressions and unifies many previous problems.
Linear Probing Revisited: Tombstones Mark the Demise of Primary Clustering
TLDR
It turns out that small design decisions in how deletions are implemented have dramatic effects on the asymptotic performance of insertions, and a new variant of linear probing is presented, which is called graveyard hashing, that completely eliminates primary clustering on any sequence of operations.

References

SHOWING 1-10 OF 25 REFERENCES
Balanced allocations and double hashing
TLDR
It is shown that the performance difference between double hashing and fully random hashing appears negligible in the standard balanced allocation paradigm, where each item is placed in the least loaded of d choices.
The analysis of closed hashing under limited randomness
This paper gives the first optimal bounds for classical closed hashing schemes in the case of limited randomness. We thereby establish the first proof of optimality for hashing arbitrarily selected
A Probabilistic Study on Combinatorial Expanders and Hashing
TLDR
This paper concludes by elaborating on how any sufficiently sized subset of inputs in any distribution expands in the subgraph of the Gabber-Galil graph expander of focus.
The Analysis of Double Hashing
Double hashing thresholds via local weak convergence
  • M. Leconte
  • Computer Science, Mathematics
    2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton)
  • 2013
TLDR
It is pointed out that the approach via the cavity method extends quite naturally to the analysis of double hashing and allows to compute the corresponding threshold and shows that the graph induced by the double hashing scheme has the same local weak limit as the one obtained with full randomness.
Peeling arguments and double hashing
  • M. Mitzenmacher, J. Thaler
  • Computer Science
    2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton)
  • 2012
TLDR
An interesting aspect of these types of processes: the results are generally the same when the randomness is structured in the manner of double hashing, which allows us to use less randomness and simplify the implementation for several hash-based data structures and algorithms.
Linear Probing with 5-wise Independence
TLDR
It is shown in this paper that linear probing using a 2-wise independent hash function may have expected logarithmic cost per operation, and it is shown that 5-wise independence is enough to ensure constant expected time per operation.
Balanced Allocations
TLDR
It is shown that with high probability, the fullest box contains only ln ln n/ln 2 + O(1) balls---exponentially less than before and a similar gap exists in the infinite process, where at each step one ball, chosen uniformly at random, is deleted, and one ball is added in the manner above.
How asymmetry helps load balancing
TLDR
The upper and lower bounds on the maximum load are tight up to additive constants, proving that the Always-Go-Left algorithm achieves an almost optimal load balancing among all sequential multiple-choice algorithm.
Fast Pseudo-Random Fingerprints
TLDR
This work proposes a method to exponentially speed up computation of various fingerprints, such as the ones used to compute similarity and rarity in massive data sets, relying on a specific family of pseudo-random hashes for which it can quickly locate hashes resulting in small values.
...
...