Improving in-memory database index performance with Intel® Transactional Synchronization Extensions

  title={Improving in-memory database index performance with Intel{\textregistered} Transactional Synchronization Extensions},
  author={Tomas Karnagel and Roman Dementiev and Ravi Rajwar and Konrad Lai and Thomas Legler and Benjamin Schlegel and Wolfgang Lehner},
  journal={2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)},
The increasing number of cores every generation poses challenges for high-performance in-memory database systems. While these systems use sophisticated high-level algorithms to partition a query or run multiple queries in parallel, they also utilize low-level synchronization mechanisms to synchronize access to internal database data structures. Developers often spend significant development and verification effort to improve concurrency in the presence of such synchronization. The Intel… 

Figures and Tables from this paper

Scaling HTM-Supported Database Transactions to Many Cores

It is shown that HTM allows for achieving nearly lock-free processing of database transactions by carefully controlling the data layout and the access patterns, and provides a scalable, powerful, and easy to use synchronization primitive.

A specialized B-tree for concurrent datalog evaluation

A specialized B-tree data structure for an open-source Datalog compiler written in C++ that features an optimistic locking protocol for scalability, is highly tuned, and uses the notion of "hints" to re-use the results of previously performed tree traversals to exploit data ordering properties exhibited by Datalogs evaluation.

ParaTM: Transparent Embedding of Hardware Transactional Memory for Traditional Applications

This paper evaluated and analyzed the performance of TSX and introduced a mechanism, named ParaTM, to transparently adopt TSX for existing lock-based applications, and confirmed Para TM is highly effective for transparency and performance.

Applying HTM to an OLTP System: No Free Lunch

It is found that HTM can improve performance of the TATP workload by 13--17% when applied judiciously, and attempting to replace all synchronization reduces performance compared to the baseline case due to high percentage of aborts caused by the limitations of the current HTM implementation.

Main Memory Database Recovery

This survey aims to provide a thorough review of in-memory database recovery techniques and discusses the recovery strategies of a representative sample of modern in- memory databases.

Persistent hybrid transactional memory for databases

PHyTM allows hardware assisted ACID transactions to execute concurrently with pure software transactions, which allows applications to gain the benefit of persistent HTM while simultaneously accommodating unbounded transactions (with a high degree of concurrency).

The Impact of Columnar In-Memory Databases on Enterprise Systems

First analyses of productive applications adopting this concept confirm that system architectures enabled by in-memory column stores are conceptually superior for business transaction processing compared to row-based approaches.

Empirical Evaluation of a Thread-Safe Dynamic Range Min-Max Tree using HTM

It is shown that because of the formal properties of RMMTs, HTM is a good fit for adding concurrency to otherwise slow lock-based alternatives, and performs better than locks when the number of write operations increase, making it a practical structure to use in several write-intensive contexts.

To Lock, Swap, or Elide: On the Interplay of Hardware Transactional Memory and Lock-Free Indexing

This study uses two state-of-the-art index implementations: a memory-optimized B-tree extended with HTM to provide multi-threaded concurrency and the Bw-tree lock-free B- tree used in several Microsoft production environments.

Inherent limitations of hybrid transactional memory

A general model for HyTM implementations, which captures the ability of hardware transactions to buffer memory accesses and captures for the first time the trade-off between the degree of hardware-software TM concurrency and the amount of instrumentation overhead.



Cache-Conscious Concurrency Control of Main-Memory Indexes on Shared-Memory Multiprocessor Systems

This paper presents an optimistic, latch-free index traversal (OLFIT) CC scheme based on a pair of consistent node read and update primitives that shows the superior scalability on the multiprocessor system as well as the performance comparable to that of the sequential execution without CC on the uniprocessors.

Software transactional memory

STM is used to provide a general highly concurrent method for translating sequential object implementations to non-blocking ones based on implementing a k-word compare&swap STM-transaction, and outperforms Herlihy’s translation method for sufficiently large numbers of processors.

Atomic Transactional Execution in Hardware: A New High-Performance Abstraction for Databases?

A hardware mechanism based on Transactional Lock Removal (TLR) is explained and suggested how it could be used to control the atomic execution of transactions in a database system.

Transactional Memory, 2nd edition

This book presents an overview of the state of the art in the design and implementation of transactional memory systems, as of early spring 2010.

Concurrent Cache-Oblivious B-Trees Using Transactional Memory

It is argued that a solution for the first issues of transaction I/O and durability is to use a TM system that supports transactions on memory-mapped data, and it is believed this approach can be generalized, that memory- mapped transactions can be used for other applications that concurrently access data stored in external memory.

Transactional lock-free execution of lock-based programs

This paper proposes Transactional Lock Removal (TLR) and shows how a program that uses lock-based synchronization can be executed by the hardware in a lock-free manner, even in the presence of conflicts, without programmer support or software changes.

Transactional Memory: Architectural Support For Lock-free Data Structures

  • M. HerlihyJ. Moss
  • Computer Science
    Proceedings of the 20th Annual International Symposium on Computer Architecture
  • 1993
Simulation results show that transactional memory matches or outperforms the best known locking techniques for simple benchmarks, even in the absence of priority inversion, convoying, and deadlock.

PALM: Parallel Architecture-Friendly Latch-Free Modifications to B+ Trees on Many-Core Processors

PALM, a novel technique for performing multiple concurrent queries on in-memory B + trees based on the Bulk Synchronous Parallel model, and obtains close to peak throughput at very low response times of less than 350 s, even for large trees.

Foster b-trees

An implementation and a performance evaluation show that the Foster B-tree supports high concurrency and high update rates without compromising consistency, correctness, or read performance.

Hoard: a scalable memory allocator for multithreaded applications

Hoard is the first allocator to simultaneously solve the above problems, and combines one global heap and per-processor heaps with a novel discipline that provably bounds memory consumption and has very low synchronization costs in the common case.