High Performance Construction of RecSplit Based Minimal Perfect Hash Functions

  title={High Performance Construction of RecSplit Based Minimal Perfect Hash Functions},
  author={Dominik Bez and Florian Kurpicz and Hans-Peter Lehmann and Peter Sanders},
A minimal perfect hash function (MPHF) is a bijection from a set of objects S to the first | S | integers. It can be used as a building block in databases and data compression. RecSplit [Espos-ito et al., ALENEX’20] is currently the most space efficient practical minimal perfect hash function. Its main building blocks are splittings and bijections . Using a tree-like data structure, RecSplit first splits the input set into small sets of constant size ‘ and then computes a bijection on each leaf… 
1 Citation

Figures and Tables from this paper

Learned Monotone Minimal Perfect Hashing

The core idea of LeMonHash is surprisingly simple and effective: it learns a monotone mapping from keys to their rank via an error-bounded piecewise linear model (the PGM-index), and then it solves the collisions that might arise among keys mapping to the same rank estimate by associating small integers with them in a retrieval data structure (BuRR).

PTHash: Revisiting FCH Minimal Perfect Hashing

An improved algorithm is presented that scales well to large sets and reduces space consumption altogether, without compromising the lookup time and finds functions that are competitive in space with state-of-the art techniques and provide 2-4x better lookup time.

RecSplit: Minimal Perfect Hashing via Recursive Splitting

This work proposes a new technique for storing minimal perfect hash functions with expected linear construction time and expected constant lookup time that makes it possible to build for the first time, for example, structures which need $1.56$ bits per key, in less than $2$ ms per key.

SicHash - Small Irregular Cuckoo Tables for Perfect Hashing

This paper presents the PHF construction algorithm SicHash - Small Irregular Cuckoo Tables for Perfect Hashing, which improves the state of the art in terms of space usage versus construction time for a wide range of configurations.

Fast and scalable minimal perfect hashing for massive key sets

A simple algorithm is revisited and it is shown that it is highly competitive with the state of the art, especially in terms of construction time and memory usage.

Constructing Minimal Perfect Hash Functions Using SAT Technology

This article proposes two SAT-based constructions of minimal perfect hash functions that can handle instances where the dictionaries contain up to 40 elements, thereby outperforming the existing (brute-force) methods.

Hash, Displace, and Compress

The main new feature is that the algorithm is a modification of Pagh’s “hash-and-displace” approach with data compression on a sequence of hash function indices, which can be used for k-perfect hashing, where at most k keys may be mapped to the same value.

Parallel and External-Memory Construction of Minimal Perfect Hash Functions with PTHash

This work proposes a new construction algorithm for PTHash enabling: (1) multi-threading, to either build functions more quickly or more space-efficiently, and (2) external-memory processing to scale to inputs much larger than the available internal memory.

Perfect Hashing for Data Management Applications

This paper proposes a novel, theoretically optimal perfect hashing scheme that greatly simplifies previous methods, and is designed to make good use of the memory hierarchy, and demonstrates the scalability of the algorithm by considering a set of over one billion URLs from the World Wide Web of average length 64.

A faster algorithm for constructing minimal perfect hash functions

A new algorithm is described for quickly finding minimal perfect hash functions whose specification space is very close to the theoretical lower bound, i.e., around 2 bits per key.