• Corpus ID: 18567538

MonetDB: Two Decades of Research in Column-oriented Database Architectures

  title={MonetDB: Two Decades of Research in Column-oriented Database Architectures},
  author={Stratos Idreos and Fabian Groffen and Niels Nes and Stefan Manegold and K. Sjoerd Mullender and Martin L. Kersten},
  journal={IEEE Data Eng. Bull.},
MonetDB is a state-of-the-art open-source column-store database management system targeting applications in need for analytics over large collections of data. MonetDB is actively used nowadays in health care, in telecommunications as well as in scientific databases and in data management research, accumulating on average more than 10,000 downloads on a monthly basis. This paper gives a brief overview of the MonetDB technology as it developed over the past two decades and the main research… 

MonetDBLite: An Embedded Analytical Database

This paper introduces the embedded analytical database MonetDBLite, designed for OLAP scenarios, and offers near-instantaneous data transfer between the database and analytical tools, all the while maintaining the transactional guarantees and ACID properties of a standard relational system.

Vectorized UDFs in Column-Stores

MonetDB/Python is presented, a new system that combines the open-source database MonetDB with the vector-based language Python and demonstrates efficiency gains of orders of magnitude.

Column-oriented data model for data-intensive systems

  • Simeon EmanuilovA. Dimov
  • Computer Science
    2022 10th International Scientific Conference on Computer Science (COMSCI)
  • 2022
A design for a webhook (event notifications) software system to facilitate the usage of the columnar approach in data-intensive software systems and an example of using the column store where the analytical requirement is not the primary one.

Modern Column Stores for Big Data Processing

This survey traces the technology evolution and history of the fall of row stores and rise of column stores, delving into architectural details of column DBs from academia and industry.

Array Database Scalability: Intercontinental Queries on Petabyte Datasets

This demonstration aims to showcase the capabilities of rasdaman by allowing users to execute queries that combine petabyte datasets stored at two institutions on different continents.

A Physical Design Strategy for Datasets with Multiple Dimensions

This chapter proposes a physical design strategy that improves query execution times in MonetDB with a minimum percentage improvement of 29% and shows that the improvement was statistically significant by means of statistical tests.

The BigDAWG Polystore System

This paper presents a new view of federated databases to address the growing need for managing information that spans multiple data models, and proposes a polystore architecture designed to unify querying overmultiple data models.

Engineering High-Performance Database Engines

A survey over two different database engines, the disk-based SPARQL-processing engine RDF-3X, and the relational main-memory engine HyPer, discusses the design choices made during development, and highlights optimization techniques that are important for both systems.

SAP HANA - The Evolution of an In-Memory DBMS from Pure OLAP Processing Towards Mixed Workloads

The challenges of running mixed workloads with low-latency OLTP queries and complex analytical queries in the context of the same database management system are discussed and an outlook on the future database interaction patterns of modern business applications is given.

Bridging the Chasm between Science and Reality

Light is shed on the importance of a good profiling tool during system construction and the techniques deployed to gain permission from customers’ legal departments to share profiling traces captured on live production systems.



MonetDB/X100: Hyper-Pipelining Query Execution

An in-depth investigation to the reason why database systems tend to achieve only low IPC on modern CPUs in compute-intensive application areas, and a new set of guidelines for designing a query processor for the MonetDB system that follows these guidelines.

Exploiting the power of relational databases for efficient stream processing

A complete architecture is proposed, the DataCell, which is implemented on top of an open-source column-oriented DBMS, which allows batch processing of tuples and selectively pick tuples from a basket based on the query requirements exploiting a novel query component, the basket expressions.

SciBORQ: Scientific data management with Bounds On Runtime and Quality

This paper proposes SciBORQ, a framework for scientific data exploration that gives precise control over runtime and quality of query answering, and presents novel techniques to derive multiple interesting data samples, called impressions.

SciQL, a query language for science applications

SciQL1 provides a seamless symbiosis of array-, set-, and sequence- interpretation using a clear separation of the mathematical object from its underlying implementation, and leads to a generalization of window-based query processing with wide applicability in science domains.

Breaking the memory wall in MonetDB

This paper reports how research around the MonetDB database system has led to a redesign of database architecture in order to take advantage of modern hardware, and in particular to avoid hitting the memory wall.

MonetDB/XQuery: a fast XQuery processor powered by a relational engine

The main features, key contributions, and lessons learned while implementing a purely relational XQuery system, which implements all essential XML database functionalities such that it can learn from the full consequences of the architectural decisions.

Database Architecture Optimized for the New Bottleneck: Memory Access

A simple scan test is used to show the severe impact of main-memory access bottleneck, and radix algorithms for partitioned hash-join are introduced, using a detailed analytical model that incorporates memory access cost.

Self-organizing tuple reconstruction in column-stores

A novel design, partial sideways cracking, is proposed that achieves performance similar to using presorted data, but without requiring the heavy initial presorting step itself, and brings significant performance benefits for multi-attribute queries.

Super-Scalar RAM-CPU Cache Compression

This work proposes three new versatile compression schemes (PDICT, PFOR, and PFOR-DELTA) that are specifically designed to extract maximum IPC from modern CPUs and compares these algorithms with compression techniques used in (commercial) database and information retrieval systems.

Stochastic Database Cracking: Towards Robust Adaptive Indexing in Main-Memory Column-Stores

Stochastic cracking is introduced, a significantly more resilient approach to adaptive indexing that maintains the desired properties of original database cracking while at the same time it performs well with diverse realistic workloads.