A comparison of adaptive radix trees and hash tables

@article{lvarez2015ACO,
  title={A comparison of adaptive radix trees and hash tables},
  author={V{\'i}ctor {\'A}lvarez and Stefan Richter and Xiao Chen and Jens Dittrich},
  journal={2015 IEEE 31st International Conference on Data Engineering},
  year={2015},
  pages={1227-1238}
}
With prices of main memory constantly decreasing, people nowadays are more interested in performing their computations in main memory, and leave high I/O costs of traditional disk-based systems out of the equation. [] Key Result The authors of ART presented experiments that indicate that ART was clearly a better choice over other recent tree-based data structures like FAST and B+-trees. However, ART was not the first adaptive radix tree. To the best of our knowledge, the first was the Judy Array (Judy for…
Efficient Processing of Range Queries in Main Memory
TLDR
A cache-optimized, updateable main-memory index structure, the cache-sensitive skip list, is proposed, which targets the execution of range queries on single database columns, and a novel, fast and space-efficient, main- memory based index structure is devised, the BB-Tree, which supports multidimensional range and point queries and provides a parallel search operator that leverages the multi-threading capabilities of modern CPUs.
Hyperion: Building the Largest In-memory Search Tree
TLDR
This paper presents Hyperion, a trie-based main-memory key-value store achieving extreme space efficiency and can significantly reduce the index memory footprint and its performance-to-memory ratio is more than two times better than the best implemented alternative strategy for randomized string data sets.
Cache-Sensitive Skip List: Efficient Range Queries on Modern CPUs
TLDR
This work presents Cache-Sensitive Skip Lists (CSSL) as a novel index structure that is optimized for range queries and exploits modern CPUs and CSSL is based on a cache-friendly data layout and traversal algorithm that minimizes cache misses, branch mispredictions, and allows to exploit SIMD instructions for search.
Parallelizing Approximate Search on Adaptive Radix Trees
TLDR
This work uses the edit distance to compare two search keys in the tree and select appropriate values and proposes several variations of the CPU algorithm like fixed vs. dynamic memory layouts and pointer vs. pointer-less data structures.
S3: A Scalable In-memory Skip-List Index for Key-Value Store
TLDR
Experiments show that S3 achieves a comparable performance to other new memory indexing schemes, and can replace current in-memory skip-list of LevelDB and RocksDB to support huge volume of data.
START — Self-Tuning Adaptive Radix Tree
TLDR
This work introduces START, a self-tuning variant of ART that uses nodes spanning multiple keybytes that performs on average 85 % faster than a regular ART on a wide variety of read-only workloads and 45% faster for read-mostly workloads.
Experimental Index Evaluation for Partial Indexes in Horizontally Partitioned In-Memory Databases
TLDR
This work evaluates different index implementations in their lookup speed, maintenance cost, and memory consumption to identify suitable implementations to realize partial indexes, and chooses the hash maps Robin Hood (RH) Flat Map and Tessil’s (TSL) Sparse Map to achieve overall the best evaluation results.
CuART - a CUDA-based, scalable Radix-Tree lookup and update engine
TLDR
An optimized version of the Adaptive Radix Tree (ART) index structure for GPUs is presented, where it becomes visible that traditional GDDR6(X) is beneficial for the index lookups due to the faster clock rates compared to High Bandwidth Memory (HBM).
A Six-dimensional Analysis of In-memory Aggregation
TLDR
The results show that the ideal approach in a given situation depends on the input and the workload, and sorting algorithms are faster in holistic aggregate queries, whereas hash tables perform better in distributive queries.
Efficient indexing for big data in Hadoop MapReduce and main memory databases
TLDR
This study indicates that choosing the right hashing method and configuration can make an order of magnitude difference in insert and lookup performance, and identifies seven key factors that influence hashing performance, evaluate their impact, and discuss the implications on hashing in modern databases.
...
...

References

SHOWING 1-10 OF 20 REFERENCES
The adaptive radix tree: ARTful indexing for main-memory databases
Main memory capacities have grown up to a point where most databases fit into RAM. For main-memory database systems, index structure performance is a critical bottleneck. Traditional in-memory data
Making B+- trees cache conscious in main memory
TLDR
A new indexing technique called CSB+-Trees is proposed that stores all the child nodes of any given node contiguously, and keeps only the address of the first child in each node, and introduces two variants of CSB+, which can reduce the copying cost when there is a split and preallocate space for the full node group to reduce the split cost.
FAST: fast architecture sensitive tree search on modern CPUs and GPUs
TLDR
FAST is an extremely fast architecture sensitive layout of the index tree logically organized to optimize for architecture features like page size, cache line size, and SIMD width of the underlying hardware, and achieves a 6X performance improvement over uncompressed index search for large keys on CPUs.
Space Efficient Hash Tables with Worst Case Constant Access Time
TLDR
This is the first dictionary that has worst case constant access time and expected constant update time, works with (1 + ε)n space, and supports satellite information.
Balanced Allocation and Dictionaries with Tightly Packed Constant Size Bins
TLDR
It is shown that e> (2/e)d−−1 is sufficient to guarantee that with high probability each ball can be put into one of the two bins assigned to it, without any bin overflowing.
On risks of using cuckoo hashing with simple universal hash classes
TLDR
It is proved that the failure probability is high when cuckoo hashing is run with the multiplicative class or with the very common class of linear hash functions over a prime field, even if space 4n is provided.
Cuckoo hashing
TLDR
A simple dictionary with worst case constant lookup time, equaling the theoretical performance of the classic dynamic perfect hashing scheme of Dietzfelbinger et al, and is competitive with the best known dictionaries having an average case (but no nontrivial worst case) guarantee on lookup time.
A Reliable Randomized Algorithm for the Closest-Pair Problem
TLDR
In the course of solving the duplicate-grouping problem, a new universal class of hash functions of independent interest is described, and it is shown that both of the foregoing problems can be solved by randomized algorithms that useO(n) space and finish inO( n) time with probability tending to 1 asngrows to infinity.
The art of computer programming: sorting and searching (volume 3)
Apparatus for supporting different nets for various sporting purposes including interengaging tubular rods which are arranged to interconnect and have ground engaging portions suitable to be useful
Some Open Questions Related to Cuckoo Hashing
The purpose of this brief note is to describe recent work in the area of cuckoo hashing, including a clear description of several open problems, with the hope of spurring further research.
...
...