Engineering scalable, cache and space efficient tries for strings

@article{Askitis2010EngineeringSC,
  title={Engineering scalable, cache and space efficient tries for strings},
  author={Nikolas Askitis and Ranjan Sinha},
  journal={The VLDB Journal},
  year={2010},
  volume={19},
  pages={633-660}
}
Storing and retrieving strings in main memory is a fundamental problem in computer science. The efficiency of string data structures used for this task is of paramount importance for applications such as in-memory databases, text-based search engines and dictionaries. The burst trie is a leading choice for such tasks, as it can provide fast sorted access to strings. The burst trie, however, uses linked lists as substructures which can result in poor use of CPU cache and main memory. Previous… 
Redesigning the string hash table, burst trie, and BST to exploit cache
TLDR
Two alternatives to the standard representation of strings are explored: the simple expedient of including the string in its node, and, for linked lists, the more drastic step of replacing each list of nodes by a contiguous array of characters.
The adaptive radix tree: ARTful indexing for main-memory databases
Main memory capacities have grown up to a point where most databases fit into RAM. For main-memory database systems, index structure performance is a critical bottleneck. Traditional in-memory data
Dynamic Path-decomposed Tries
TLDR
The main idea is to embrace the path decomposition technique, which was proposed for constructing cache-friendly tries, and design data structures based on recent compact hash trie representations to store the path-decomposed trie in small memory.
Space- and Time-Efficient String Dictionaries
TLDR
This work develops novel string dictionaries based on double-array tries and proposes how to improve the high construction costs of applying Re-Pair by introducing an alternative compression strategy using dictionary encoding.
Practical Implementation of Space-Efficient Dynamic Keyword Dictionaries
TLDR
This paper uses path decomposition, which is proposed for constructing cache-friendly trie structures, for dynamic construction in compact space with a different approach and shows that its implementation can construct keyword dictionaries in spaces up to 2.8x smaller than the most compact existing dynamic implementation.
Put an elephant into a fridge
TLDR
This work proposes a highly cache-efficient scheme, called Cavast, to optimize the cache utilization of large-capacity in-memory key-value stores, and presents two light-weight, software-only mechanisms to enable user to indirectly control the cache content at application level.
c-trie++: A dynamic trie tailored for fast prefix searches
Top Tree Compression of Tries
TLDR
This work presents a compressed representation of tries based on top tree compression that works on a standard, comparison-based, pointer machine model of computation and supports efficient prefix search queries and develops several interesting data structures that work on a pointer machine and are of independent interest.
BOPL: Batch Optimized Persistent List
Due to the increase of the number of operations in the databases, and with the improvement of both performance and capacity of the DRAM, the use of In Memory Database (IMDB) has become feasible
Top Tree Compression of Tries
TLDR
This work shows how to preprocess a set of strings of total length $n$ over an alphabet of size $\sigma$ into a compressed data structure of worst-case optimal size $O(n/\log_\sigma n)$ that determines if $P$ is a prefix of one of the strings in time $O(\min(m\log \sigma,m + \log n)$.
...
1
2
3
...

References

SHOWING 1-10 OF 117 REFERENCES
Burst tries: a fast, efficient data structure for string keys
TLDR
These experiments show that the burst trie is particularly effective for the skewed frequency distributions common in text collections, and dramatically outperforms all other data structures for the task of managing strings while maintaining sort order.
Cache-conscious sorting of large sets of strings with dynamic tries
TLDR
This work proposes a new sorting algorithm for strings, burstsort, based on dynamic construction of a compact trie in which strings are kept in buckets, which is simple, fast, and efficient.
Cache-efficient string sorting using copying
TLDR
C-burstsort is introduced, which copies the unexamined tail of each key to the bucket and discards the original key to improve data locality and show that sorting is typically twice as fast as the original burstsort and four to five times faster than multikey quicksort and previous radixsorts.
Cache-oblivious string B-trees
TLDR
This paper presents a cache-oblivious string B-tree (COSB-tree) data structure that is efficient in all these ways: searches asymptotically optimally and inserts and deletes nearly optimally, and maintains an index whose size is proportional to the front-compressed size of the dictionary.
HAT-Trie: A Cache-Conscious Trie-Based Data Structure For Strings
TLDR
The HAT-trie is shown to be the most efficient trie-based data structure for managing variable-length strings in-memory while maintaining sort order and approaching that of the cache-conscious hash table.
Cache-Conscious Collision Resolution in String Hash Tables
TLDR
Two alternatives to the standard representation of string hash tables are explored: the simple expedient of including the string in its node, and the more drastic step of replacing each list of nodes by a contiguous array of characters.
B-tries for disk-based string management
TLDR
This work proposes new algorithms for the insertion, deletion, and equality search of variable-length strings in a disk-resident B-trie, as well as novel splitting strategies which are a critical element of a practical implementation.
Cache Conscious Indexing for Decision-Support in Main Memory
TLDR
A new indexing technique called \Cache-Sensitive Search Trees" (CSS-trees) is proposed, to provide faster lookup times than binary search by paying attention to reference locality and cache behavior, without using substantial extra space.
Fast and Compact Hash Tables for Integer Keys
TLDR
This paper explains how to efficiently implement an array hash table for integers and demonstrates, through careful experimental evaluations, which hash table offers the best performance for maintaining a large dictionary of integers in-memory, on a current cache-oriented processor.
Adaptive Algorithms for Cache-Efficient Trie Search
TLDR
This paper presents cache-efficient algorithms for trie search that use different data structures to represent different nodes in a trie and indicates that these algorithms outperform alternatives that are otherwise efficient but do not take cache characteristics into consideration.
...
1
2
3
4
5
...