• Corpus ID: 1382872

Cache-Aware Lock-Free Concurrent Hash Tries

  title={Cache-Aware Lock-Free Concurrent Hash Tries},
  author={Aleksandar Prokopec and Philip Sidney Bagwell and Martin Odersky},
This report describes an implementation of a non-blocking concurrent shared-memory hash trie based on single-word compare-and-swap instructions. Insert, lookup and remove operations modifying different parts of the hash trie can be run independent of each other and do not contend. Remove operations ensure that the unneeded memory is freed and that the trie is kept compact. A pseudocode for these operations is presented and a proof of correctness is given -- we show that the implementation is… 

Figures from this paper

Cache-tries: concurrent lock-free hash tries with constant-time operations
This paper presents a novel lock-free concurrent hash trie design that exerts less pressure on the memory allocator, and shows a statistical analysis for the constant-time bound, which is the first such proof for hash tries.
Efficient Lock-Free Removing and Compaction for the Cache-Trie Data Structure
The recently proposed cache-trie data structure improves the performance of lock-free Ctries by maintaining an auxiliary data structure called a cache. The cache allows basic operations to run in
Analysis and Evaluation of Non-Blocking Interpolation Search Trees
The recently proposed implementation of the first non-blocking concurrent interpolation search tree (C-IST) data structure is summarized, and it is shown that the C-IST has the following properties: correct and linearizable, wait-free, and lock-free.
Improving STM performance with transactional structs
This work implements several data structures, discusses their design, and provides benchmark results on a large multicore machine, showing that concurrent data structures built with TStruct out-scale and out-perform their TVar-based equivalents.
On Evaluating the Renaissance Benchmarking Suite: Variety, Performance, and Complexity
An overview of the experimental setup that was used to assess the variety and complexity of the Renaissance suite, as well as its amenability to new compiler optimizations, is given and the obtained measurements are presented.
Renaissance: benchmarking suite for parallel applications on the JVM
Renaissance, a new benchmark suite composed of modern, real-world, concurrent, and object-oriented workloads that exercise various concurrency primitives of the JVM, is presented and it is shown that the use of concurrencyPrimitives in these workloads reveals optimization opportunities that were not visible with the existing workloads.
Parallel Query Evaluation in Streaming Environments
This report continues work started last year in implementing automatic methods of parallel code generated in DBToaster by way of designing, implementing, and evaluating thread-safe MultiMap data structures which are used to represent relations within DB toaster-generated code.
Non-blocking interpolation search trees with doubly-logarithmic running time
The first non-blocking implementation of the classic interpolation search tree (IST) data structure is proposed, and the results are surprisingly robust to distributional skew, which suggests that the data structure can be a promising alternative to classic concurrent search structures.
An Optimization-Driven Incremental Inline Substitution Algorithm for Just-in-Time Compilers
Inlining is one of the most important compiler optimizations. It reduces call overheads and widens the scope of other optimizations. But, inlining is somewhat of a black art of an optimizing
XSearch : Distributed Information Retrieval in Large-Scale Storage Systems
This project argues the need for new methods to support information retrieval in the context of large-scale storage systems, and proposes the implementation of a scalable distributed indexing system to integrate with existing parallel and distributed filesystems.


Non-blocking binary search trees
This paper describes the first complete implementation of a non-blocking binary search tree in an asynchronous shared-memory system using single-word compare-and-swap operations. The implementation
Split-ordered lists: lock-free extensible hash tables
Empirical tests show the first lock-free implementation of an extensible hash table running on current architectures provides concurrent insert, delete, and search operations with an expected O(1) cost and is well suited for real-time applications.
A practical concurrent binary search tree
Experimental evidence shows that the proposed concurrent relaxed balance AVL tree algorithm outperforms a highly tuned concurrent skip list for many access patterns, with an average of 39% higher single- threaded throughput and 32% higher multi-threaded throughput over a range of contention levels and operation mixes.
High performance dynamic lock-free hash tables and list-based sets
The experimental results show that the new algorithm outperforms the best known lock-free as well as lock-based hash table implementations by significant margins, and indicate that it is the algorithm of choice for implementing shared hash tables.
Nonblocking Algorithms and Preemption-Safe Locking on Multiprogrammed Shared Memory Multiprocessors
The results indicate that the nonblocking queue consistently outperforms the best known alternatives and that data-structure-specific nonblocking algorithms, which exist for queues, stacks, and counters, can work extremely well.
Concurrent manipulation of binary search trees
The concurrency control techniques introduced in the paper include the use of special nodes and pointers to redirect searches, and theUse of copies of sections of the tree to introduce many changes simultaneously and therefore avoid unpredictable interleaving.
Linearizability: a correctness condition for concurrent objects
This paper defines linearizability, compares it to other correctness conditions, presents and demonstrates a method for proving the correctness of implementations, and shows how to reason about concurrent objects, given they are linearizable.
Ideal Hash Trees
Array Mapped Tries(AMT), first described in Fast and Space Efficient Trie Searches, Bagwell [2000], form the underlying data structure and the concept is then applied to external disk or distributed storage to obtain an algorithm that achieves single access searches, close to single access inserts and greater than 80 percent disk block load factors.
The art of multiprocessor programming
This talk will survey the area ofTransactional memory, a computational model in which threads synchronize by optimistic, lock-free transactions, with a focus on open research problems.
A Pragmatic Implementation of Non-blocking Linked-Lists
This work presents a new non-blocking implementation of concurrent linked-lists supporting linearizable insertion and deletion operations, conceptually simpler and substantially faster than previous schemes.