# Engineering scalable, cache and space efficient tries for strings

@article{Askitis2010EngineeringSC, title={Engineering scalable, cache and space efficient tries for strings}, author={Nikolas Askitis and Ranjan Sinha}, journal={The VLDB Journal}, year={2010}, volume={19}, pages={633-660} }

Storing and retrieving strings in main memory is a fundamental problem in computer science. The efficiency of string data structures used for this task is of paramount importance for applications such as in-memory databases, text-based search engines and dictionaries. The burst trie is a leading choice for such tasks, as it can provide fast sorted access to strings. The burst trie, however, uses linked lists as substructures which can result in poor use of CPU cache and main memory. Previous…

## Figures, Tables, and Topics from this paper

## 24 Citations

Redesigning the string hash table, burst trie, and BST to exploit cache

- Computer ScienceJEAL
- 2011

Two alternatives to the standard representation of strings are explored: the simple expedient of including the string in its node, and, for linked lists, the more drastic step of replacing each list of nodes by a contiguous array of characters.

The adaptive radix tree: ARTful indexing for main-memory databases

- Computer Science2013 IEEE 29th International Conference on Data Engineering (ICDE)
- 2013

Main memory capacities have grown up to a point where most databases fit into RAM. For main-memory database systems, index structure performance is a critical bottleneck. Traditional in-memory data…

Dynamic Path-decomposed Tries

- Computer ScienceACM J. Exp. Algorithmics
- 2020

The main idea is to embrace the path decomposition technique, which was proposed for constructing cache-friendly tries, and design data structures based on recent compact hash trie representations to store the path-decomposed trie in small memory.

Space- and Time-Efficient String Dictionaries

- Computer Science
- 2018

This work develops novel string dictionaries based on double-array tries and proposes how to improve the high construction costs of applying Re-Pair by introducing an alternative compression strategy using dictionary encoding.

Practical Implementation of Space-Efficient Dynamic Keyword Dictionaries

- Computer ScienceSPIRE
- 2017

This paper uses path decomposition, which is proposed for constructing cache-friendly trie structures, for dynamic construction in compact space with a different approach and shows that its implementation can construct keyword dictionaries in spaces up to 2.8x smaller than the most compact existing dynamic implementation.

Put an elephant into a fridge

- Computer ScienceProc. VLDB Endow.
- 2020

This work proposes a highly cache-efficient scheme, called Cavast, to optimize the cache utilization of large-capacity in-memory key-value stores, and presents two light-weight, software-only mechanisms to enable user to indirectly control the cache content at application level.

c-trie++: A dynamic trie tailored for fast prefix searches

- MathematicsInformation and Computation
- 2021

Top Tree Compression of Tries

- Computer Science, MathematicsAlgorithmica
- 2021

This work presents a compressed representation of tries based on top tree compression that works on a standard, comparison-based, pointer machine model of computation and supports efficient prefix search queries and develops several interesting data structures that work on a pointer machine and are of independent interest.

BOPL: Batch Optimized Persistent List

- 2019

Due to the increase of the number of operations in the databases, and with the improvement of both performance and capacity of the DRAM, the use of In Memory Database (IMDB) has become feasible…

Top Tree Compression of Tries

- Computer Science, MathematicsISAAC
- 2019

This work shows how to preprocess a set of strings of total length $n$ over an alphabet of size $\sigma$ into a compressed data structure of worst-case optimal size $O(n/\log_\sigma n)$ that determines if $P$ is a prefix of one of the strings in time $O(\min(m\log \sigma,m + \log n)$.

## References

SHOWING 1-10 OF 117 REFERENCES

Burst tries: a fast, efficient data structure for string keys

- Computer ScienceTOIS
- 2002

These experiments show that the burst trie is particularly effective for the skewed frequency distributions common in text collections, and dramatically outperforms all other data structures for the task of managing strings while maintaining sort order.

Cache-conscious sorting of large sets of strings with dynamic tries

- Computer ScienceJEAL
- 2004

This work proposes a new sorting algorithm for strings, burstsort, based on dynamic construction of a compact trie in which strings are kept in buckets, which is simple, fast, and efficient.

Cache-efficient string sorting using copying

- Computer ScienceACM J. Exp. Algorithmics
- 2006

C-burstsort is introduced, which copies the unexamined tail of each key to the bucket and discards the original key to improve data locality and show that sorting is typically twice as fast as the original burstsort and four to five times faster than multikey quicksort and previous radixsorts.

Cache-oblivious string B-trees

- Computer Science, MathematicsPODS '06
- 2006

This paper presents a cache-oblivious string B-tree (COSB-tree) data structure that is efficient in all these ways: searches asymptotically optimally and inserts and deletes nearly optimally, and maintains an index whose size is proportional to the front-compressed size of the dictionary.

HAT-Trie: A Cache-Conscious Trie-Based Data Structure For Strings

- Computer ScienceACSC
- 2007

The HAT-trie is shown to be the most efficient trie-based data structure for managing variable-length strings in-memory while maintaining sort order and approaching that of the cache-conscious hash table.

Cache-Conscious Collision Resolution in String Hash Tables

- Computer ScienceSPIRE
- 2005

Two alternatives to the standard representation of string hash tables are explored: the simple expedient of including the string in its node, and the more drastic step of replacing each list of nodes by a contiguous array of characters.

B-tries for disk-based string management

- Computer ScienceThe VLDB Journal
- 2008

This work proposes new algorithms for the insertion, deletion, and equality search of variable-length strings in a disk-resident B-trie, as well as novel splitting strategies which are a critical element of a practical implementation.

Cache Conscious Indexing for Decision-Support in Main Memory

- Computer ScienceVLDB
- 1999

A new indexing technique called \Cache-Sensitive Search Trees" (CSS-trees) is proposed, to provide faster lookup times than binary search by paying attention to reference locality and cache behavior, without using substantial extra space.

Fast and Compact Hash Tables for Integer Keys

- Computer ScienceACSC
- 2009

This paper explains how to efficiently implement an array hash table for integers and demonstrates, through careful experimental evaluations, which hash table offers the best performance for maintaining a large dictionary of integers in-memory, on a current cache-oriented processor.

Adaptive Algorithms for Cache-Efficient Trie Search

- Computer ScienceALENEX
- 1999

This paper presents cache-efficient algorithms for trie search that use different data structures to represent different nodes in a trie and indicates that these algorithms outperform alternatives that are otherwise efficient but do not take cache characteristics into consideration.