Breaking the memory wall in MonetDB

  title={Breaking the memory wall in MonetDB},
  author={Peter A. Boncz and Martin L. Kersten and Stefan Manegold},
  journal={Commun. ACM},
In the past decades, advances in speed of commodity CPUs have far outpaced advances in RAM latency. Main-memory access has therefore become a performance bottleneck for many computer applications; a phenomenon that is widely known as the "memory wall." In this paper, we report how research around the MonetDB database system has led to a redesign of database architecture in order to take advantage of modern hardware, and in particular to avoid hitting the memory wall. This encompasses (i) a… 

Figures and Tables from this paper

A near-data select scan operator for database systems

This dissertation implemented the near-data select scan in the row/column/vector-wise query engines for x86 and two HMC extensions, HMC-Scan and HIPE-Scan achieving performance improvements of up to 3.7× for HMC -Scan and 5.6× for HIPE -Scan during the execution of query filters that depends on in-memory data.

RC-NVM: Enabling Symmetric Row and Column Memory Accesses for In-memory Databases

  • Peng WangShuo Li Tao Zhang
  • Computer Science
    2018 IEEE International Symposium on High Performance Computer Architecture (HPCA)
  • 2018
A dual-addressable memory architecture based on non-volatile memory, called RC-NVM, to support both row-oriented and column-oriented accesses is proposed and a group caching technique that combines the IMDB knowledge with the memory architecture to further optimize the system is proposed.

Towards efficient analytic query processing in main-memory column-stores

This dissertation is to design efficient algorithms for scan, sort and join by judiciously exploiting every bit of RAM and all the available parallelisms in each processing unit to enable skip-scan, a new fast scan that enables both data skipping and early stopping without any space overhead.

User Mode Memory Page Management: An old idea applied anew to the memory wall problem

A feasibility study is conducted to determine whether the MMU for each application process should be virtualised such that it has direct access to its own MMU page tables and the memory allocated to a process is managed exclusively by the process and not the kernel.

Hardware-Oblivious Parallelism for In-Memory Column-Stores

This work proposes an alternative design for a parallel database engine, based on a single set of hardware-oblivious operators, which are compiled down to the actual hardware at runtime, which reduces the development overhead for parallel database engines, while achieving competitive performance to hand-tuned systems.

Cache Conscious Column Organization in In-Memory Column Stores

A cost model based on cache misses for estimating the runtime of the considered plan operators using different data structures is developed, supporting the update performance required to run enterprise applications on read-optimized databases and providing a memory traffic based cost model for the merge process.

OLTP through the looking glass, and what we found there

Overall, overheads and optimizations that explain a total difference of about a factor of 20x in raw performance are identified and it is shown that there is no single "high pole in the tent" in modern (memory resident) database systems, but that substantial time is spent in logging, latching, locking, B-tree, and buffer management operations.

Elastic online analytical processing on RAMCloud

A distributed in-memory database architecture that separates the query execution engine and data access enables the usage of a large-scale DRAM-based storage system such as Stanford's RAMCloud and the push-down of bandwidth-intensive database operators into the storage system.

Memory-mapped I/O on steroids

This paper presents Aquila, a library OS that allows applications to reduce I/O overhead by customizing the memory-mapped I-O (mmio) path for files or storage devices, and shows the benefits of Aquila in two cases: using mmio in key-value stores and utilizing it in graph processing applications to extend the memory heap over fast storage devices.

Accelerating mono and multi-column selection predicates in modern main-memory database systems

This thesis tackles the aforementioned challenges of creating hardwaresensitive operator implementations automatically and exploiting the relation between multiple selection predicates and introduces the abstraction of code optimizations as a means to generate hardware-sensitive code variants automatically.



Database Architecture Optimized for the New Bottleneck: Memory Access

A simple scan test is used to show the severe impact of main-memory access bottleneck, and radix algorithms for partitioned hash-join are introduced, using a detailed analytical model that incorporates memory access cost.

Optimizing Main-Memory Join on Modern Hardware

The partitioned hash-join is refined with a new partitioning algorithm called radix-cluster, which is specifically designed to optimize memory access, and the effect of implementation techniques that optimize CPU resource usage is investigated.

Cache Conscious Algorithms for Relational Query Processing

It is shown that there are significant benefits in redesigning traditional query processing algorithms so that they can make better use of the cache, and new algorithms run 8%-200% faster than the traditional ones.

Understanding, modeling, and improving main-memory database performance

This thesis analyzes the impact of modern hardware on main-memory database performance and develops new techniques to better exploit the available hardware resources and designs detailed cost models to predict the performance behavior of database algorithms.

MonetDB/X100: Hyper-Pipelining Query Execution

An in-depth investigation to the reason why database systems tend to achieve only low IPC on modern CPUs in compute-intensive application areas, and a new set of guidelines for designing a query processor for the MonetDB system that follows these guidelines.

Conjunctive selection conditions in main memory

It is demonstrated that branch misprediction has a substantial impact on the performance of an algorithm for applying selection conditions, and a cost model that takes branch prediction into account is proposed and a query optimization algorithm that chooses a plan with optimal estimated cost is developed.

Buffering databse operations for enhanced instruction cache performance

This work answers the question "Why does a database system incur so many instruction cache misses" and proposes techniques to buffer database operations during query execution to avoid instruction cache thrashing.

GPUTeraSort: high performance graphics co-processor sorting for large database management

Overall, the results indicate that using a GPU as a co-processor can significantly improve the performance of sorting algorithms on large databases.