Optimizing database architecture for the new bottleneck: memory access

  title={Optimizing database architecture for the new bottleneck: memory access},
  author={Stefan Manegold and Peter A. Boncz and Martin L. Kersten},
  journal={The VLDB Journal},
Abstract. In the past decade, advances in the speed of commodity CPUs have far out-paced advances in memory latency. Main-memory access is therefore increasingly a performance bottleneck for many computer applications, including database systems. In this article, we use a simple scan test to show the severe impact of this bottleneck. The insights gained are translated into guidelines for database architecture, in terms of both data structures and algorithms. We discuss how vertically fragmented… 
Efficient Processing of Range Queries in Main Memory
A cache-optimized, updateable main-memory index structure, the cache-sensitive skip list, is proposed, which targets the execution of range queries on single database columns, and a novel, fast and space-efficient, main- memory based index structure is devised, the BB-Tree, which supports multidimensional range and point queries and provides a parallel search operator that leverages the multi-threading capabilities of modern CPUs.
Analytical Query Execution Optimized for all Layers of Modern Hardware
This thesis focuses on the design and implementation of highly efficient database systems by optimizing analytical query execution for all layers of modern hardware, and introduces advanced SIMD vectorization techniques generalizable across multiple operators.
Accelerators for Data Processing
This thesis provides a dynamic software acceleration scheme for exploiting inter-lookup parallelism to hide the memory access latency despite the irregularities across lookups, and proposes a programmable hardware accelerator to maximize the efficiency of the data structure lookups.
Performance Characterization of Modern Databases on Out-of-Order CPUs
It is observed that performance of modern databases is severely limited by poor cache/memory performance, and it is demonstrated that dynamic execution techniques are still effective in hiding a significant fraction of the stalls, thereby improving performance.
Cache-Aware Spatial Indices on Chip Multi-Processors: Limitations and Opportunities
  • Minhui LvWei Xiong
  • Computer Science
    2016 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)
  • 2016
This paper evaluates the performance of typical spatial indices on commodity chip multiprocessor using an analytical model incorporating memory hierarchical access cost and derives a list of advises for future spatial indices design to reach most performance on chip multip rocessor.
Rethinking SIMD Vectorization for In-Memory Databases
This paper presents novel vectorized designs and implementations of database operators, based on advanced SIMD operations, such as gathers and scatters, and highlights the impact of efficient vectorization on the algorithmic design of in-memorydatabase operators, as well as the architectural design and power efficiency of hardware.
Compiling Database Queries into Machine Code
This paper shows how queries can be brought into a form suitable for efficient translation, and how the underlying code generation can be orchestrated, by carefully abstracting away the necessary plumbing infrastructure to build a query compiler that is both maintainable and efficient.
O2-tree: a shared memory resident index in multicore architectures
Analysis and comparative experimental study show that the performance of the O2-Tree is superior to other tree-based index structures with respect to various query operations for large datasets and outperforms popular key-value stores such as BerkelyDB and TreeDB of Kyoto Cabinet for various workloads.
Edinburgh Research Explorer Asynchronous Memory Access Chaining
Asynchronous Memory Access Chaining (AMAC), a new approach for exploiting inter-lookup parallelism to hide the memory access latency, achieves high dynamism in dealing with irregularity across lookups by maintaining the state of each lookup separately from that of other lookups.


Optimizing Main-Memory Join on Modern Hardware
The partitioned hash-join is refined with a new partitioning algorithm called radix-cluster, which is specifically designed to optimize memory access, and the effect of implementation techniques that optimize CPU resource usage is investigated.
What Happens During a Join? Dissecting CPU and Memory Optimization Effects
This work presents a calibration tool that automatically extracts the relevant parameters about the memory subsystem from any hardware and demonstrates how a database system equipped with this calibrator can automatically tune memory-conscious database algorithms to their optimal settings.
Query optimization in a memory-resident domain relational calculus database system
This paper addresses aspects of query optimization in memory-resident database systems and presents practical solutions to them and presents results of performance measurements, which prove to be excellent in the current state of the art.
A Study of Index Structures for a Main Memory Database Management System
This paper proposes a new index structure, the T Tree, and it is compared to existing index structures in a main memory database environment, and results indicate that the T tree provides good overall performance in main memory.
Monet: An Impressionist Sketch of an Advanced Database System
This paper gives the goals and motivation of Monet, and outline its architectural features, including its use of the Decomposed Storage Model (DSM), emphasis on bulk operations, use of main virtual-memory and server customization, and some issues on how to build a GIS on top of Monets.
AlphaSort: a RISC machine sort
A new sort algorithm, called AlphaSort, demonstrates that commodity processors and disks can handle commercial batch workloads and proposes two new benchmarks: Minutesort: how much can you sort in a minute, and DollarSort: how to sort for a dollar.
Flattening an object algebra to provide performance
It is shown how flattening enabled us to implement a query algebra, using only a very limited set of simple operations, and was evaluated on the 1-GByte TPC-D (Transaction-processing Performance Council's Benchmark D), showing that the divide-and-conquer approach yields excellent results.
Join indices
This paper proposes a simple data structure, called a join index, for improving the performance of joins in the context of complex queries, and analysis of the join algorithm using join indices shows its excellent performance.
PRISMA/DB: A Parallel Main Memory Relational DBMS
PRISMA/DB, a full-fledged parallel, main memory relational database management system (DBMS) is described and a performance evaluation shows that the system is comparable to other state-of-the-art database machines.
Smarter Memory: Improving Bandwidth for Streamed References
The authors describe how reordering streams can result in better memory performance in processors that may spend most of their time waiting for data.