# Engineering scalable, cache and space efficient tries for strings

@article{Askitis2010EngineeringSC,
title={Engineering scalable, cache and space efficient tries for strings},
journal={The VLDB Journal},
year={2010},
volume={19},
pages={633-660}
}
• Published 1 October 2010
• Computer Science
• The VLDB Journal
Storing and retrieving strings in main memory is a fundamental problem in computer science. The efficiency of string data structures used for this task is of paramount importance for applications such as in-memory databases, text-based search engines and dictionaries. The burst trie is a leading choice for such tasks, as it can provide fast sorted access to strings. The burst trie, however, uses linked lists as substructures which can result in poor use of CPU cache and main memory. Previous…
Redesigning the string hash table, burst trie, and BST to exploit cache
• Computer Science
JEAL
• 2011
Two alternatives to the standard representation of strings are explored: the simple expedient of including the string in its node, and, for linked lists, the more drastic step of replacing each list of nodes by a contiguous array of characters.
• Computer Science
2013 IEEE 29th International Conference on Data Engineering (ICDE)
• 2013
Main memory capacities have grown up to a point where most databases fit into RAM. For main-memory database systems, index structure performance is a critical bottleneck. Traditional in-memory data
Dynamic Path-decomposed Tries
• Computer Science
ACM J. Exp. Algorithmics
• 2020
The main idea is to embrace the path decomposition technique, which was proposed for constructing cache-friendly tries, and design data structures based on recent compact hash trie representations to store the path-decomposed trie in small memory.
Space- and Time-Efficient String Dictionaries
This work develops novel string dictionaries based on double-array tries and proposes how to improve the high construction costs of applying Re-Pair by introducing an alternative compression strategy using dictionary encoding.
Practical Implementation of Space-Efficient Dynamic Keyword Dictionaries
• Computer Science
SPIRE
• 2017
This paper uses path decomposition, which is proposed for constructing cache-friendly trie structures, for dynamic construction in compact space with a different approach and shows that its implementation can construct keyword dictionaries in spaces up to 2.8x smaller than the most compact existing dynamic implementation.
Put an elephant into a fridge
• Computer Science
Proc. VLDB Endow.
• 2020
This work proposes a highly cache-efficient scheme, called Cavast, to optimize the cache utilization of large-capacity in-memory key-value stores, and presents two light-weight, software-only mechanisms to enable user to indirectly control the cache content at application level.
c-trie++: A dynamic trie tailored for fast prefix searches
Top Tree Compression of Tries
• Computer Science, Mathematics
Algorithmica
• 2021
This work presents a compressed representation of tries based on top tree compression that works on a standard, comparison-based, pointer machine model of computation and supports efficient prefix search queries and develops several interesting data structures that work on a pointer machine and are of independent interest.
BOPL: Batch Optimized Persistent List
Due to the increase of the number of operations in the databases, and with the improvement of both performance and capacity of the DRAM, the use of In Memory Database (IMDB) has become feasible
Top Tree Compression of Tries
• Computer Science, Mathematics
ISAAC
• 2019
This work shows how to preprocess a set of strings of total length $n$ over an alphabet of size $\sigma$ into a compressed data structure of worst-case optimal size $O(n/\log_\sigma n)$ that determines if $P$ is a prefix of one of the strings in time $O(\min(m\log \sigma,m + \log n)$.

## References

SHOWING 1-10 OF 117 REFERENCES
Burst tries: a fast, efficient data structure for string keys
• Computer Science
TOIS
• 2002
These experiments show that the burst trie is particularly effective for the skewed frequency distributions common in text collections, and dramatically outperforms all other data structures for the task of managing strings while maintaining sort order.
Cache-conscious sorting of large sets of strings with dynamic tries
• Computer Science
JEAL
• 2004
This work proposes a new sorting algorithm for strings, burstsort, based on dynamic construction of a compact trie in which strings are kept in buckets, which is simple, fast, and efficient.
Cache-efficient string sorting using copying
• Computer Science
ACM J. Exp. Algorithmics
• 2006
C-burstsort is introduced, which copies the unexamined tail of each key to the bucket and discards the original key to improve data locality and show that sorting is typically twice as fast as the original burstsort and four to five times faster than multikey quicksort and previous radixsorts.
Cache-oblivious string B-trees
• Computer Science, Mathematics
PODS '06
• 2006
This paper presents a cache-oblivious string B-tree (COSB-tree) data structure that is efficient in all these ways: searches asymptotically optimally and inserts and deletes nearly optimally, and maintains an index whose size is proportional to the front-compressed size of the dictionary.
HAT-Trie: A Cache-Conscious Trie-Based Data Structure For Strings
• Computer Science
ACSC
• 2007
The HAT-trie is shown to be the most efficient trie-based data structure for managing variable-length strings in-memory while maintaining sort order and approaching that of the cache-conscious hash table.
Cache-Conscious Collision Resolution in String Hash Tables
• Computer Science
SPIRE
• 2005
Two alternatives to the standard representation of string hash tables are explored: the simple expedient of including the string in its node, and the more drastic step of replacing each list of nodes by a contiguous array of characters.
B-tries for disk-based string management
• Computer Science
The VLDB Journal
• 2008
This work proposes new algorithms for the insertion, deletion, and equality search of variable-length strings in a disk-resident B-trie, as well as novel splitting strategies which are a critical element of a practical implementation.
Cache Conscious Indexing for Decision-Support in Main Memory
• Computer Science
VLDB
• 1999
A new indexing technique called \Cache-Sensitive Search Trees" (CSS-trees) is proposed, to provide faster lookup times than binary search by paying attention to reference locality and cache behavior, without using substantial extra space.
Fast and Compact Hash Tables for Integer Keys
This paper explains how to efficiently implement an array hash table for integers and demonstrates, through careful experimental evaluations, which hash table offers the best performance for maintaining a large dictionary of integers in-memory, on a current cache-oriented processor.
Adaptive Algorithms for Cache-Efficient Trie Search
• Computer Science
ALENEX
• 1999
This paper presents cache-efficient algorithms for trie search that use different data structures to represent different nodes in a trie and indicates that these algorithms outperform alternatives that are otherwise efficient but do not take cache characteristics into consideration.