From x100 to vectorwise: opportunities, challenges and things most researchers do not think about

@article{Zukowski2012FromXT,
  title={From x100 to vectorwise: opportunities, challenges and things most researchers do not think about},
  author={Marcin Zukowski and Peter A. Boncz},
  journal={Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data},
  year={2012}
}
  • M. Zukowski, P. Boncz
  • Published 20 May 2012
  • Computer Science
  • Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
In 2008 a group of researchers behind the X100 database kernel created Vectorwise: a spin-off which together with the Actian corporation (previously Ingres) worked on bringing this technology to the market. Today, Vectorwise is a popular product and one of the examples of conversion of a research prototype into successful commercial software. We describe here some of the interesting aspects of the work performed by the Vectorwise development team in the process, and discuss the opportunities… 

Figures from this paper

Slingshot: A modular framework for designing data processing systems
TLDR
Slingshot is introduced, a new data processing engine, where modularity and implementation flexibility are the top priority and it is shown that Slingshot outperforms the RDBMS in most cases, while performing comparably in others.
A study of PosDB Performance in a Distributed Environment
TLDR
This paper experimentally evaluates the performance of PosDB in a distributed environment, using the standard benchmark — the SSB, and analyzes system performance and reports a number of metrics, such as speedup and scaleup.
Diva: Making MVCC Systems HTAP-Friendly
TLDR
This paper refutes the stereotype resulting from coupled design concerns and addresses the core problem by proposing Diva (Decoupling Index from Version dAta) that physically separates version index from version data.
Integrating Column-Oriented Storage and Query Processing Techniques Into Graph Database Management Systems
TLDR
This work presents the physical data layout of columnar data structures, new columnar compression, and query-processing techniques that are optimized for GDBMSs, including a new compact vertex and edge ID scheme, a new null and empty list compression scheme based on pre-sums, and list-based query processing.
A Novel Index Method for Write Optimization on Out-of-Core Column-Store Databases
TLDR
The purpose of this thesis is to extend previous research on write optimization in out-of-core column storage databases by exploring a new type of storage model titled Timestamped Binary Association Table (TBAT), a new update designed to leverage the TBAT, and a newtype of B-Tree titled Offset B Tree (OB-tree) which will be examined.
Multi-level Parallel Query Execution Framework for CPU and GPU
TLDR
This work uses just-in-time compilation to execute whole OLAP queries on the GPU minimizing the overhead for transfer and synchronization, and describes several patterns, which can be used to build efficient execution plans and achieve the necessary parallelism.
To share or not to share vector registers?
TLDR
This work investigates the opportunity of sharing vector registers for concurrently running queries in analytical scenarios and demonstrates the feasibility of a new work sharing strategy, which can open up a wide spectrum of future research opportunities.
Examining database persistence of ISO/EN 13606 standardized electronic health record extracts: relational vs. NoSQL approaches
TLDR
Non-relational NoSQL databases seem to be more appropriate than standard relational SQL databases when database size is extremely high (secondary use, research applications).
Management of Flexible Schema Data in RDBMSs - Opportunities and Limitations for NoSQL -
TLDR
The engineering principles and practices to manage FSD in RDBMSs to meet FSD’s unique requirements and challenges are described and the limitations and issues of current practices are described.
Columnar Storage and List-based Processing for Graph Database Management Systems
TLDR
This work revisits column-oriented storage and query processing techniques in the context of contemporary graph database management systems (GDBMS) and proposes novel ones that are optimized for GDBMSs, including a novel list-based query processor, a new data structure the authors call single-indexed edge property pages and an accompanying edge ID scheme.
...
...

References

SHOWING 1-9 OF 9 REFERENCES
MonetDB/X100: Hyper-Pipelining Query Execution
TLDR
An in-depth investigation to the reason why database systems tend to achieve only low IPC on modern CPUs in compute-intensive application areas, and a new set of guidelines for designing a query processor for the MonetDB system that follows these guidelines.
Balancing vectorized query execution with bandwidth-optimized storage
TLDR
A new database system architecture is presented, realized in the MonetDB/X100 prototype, that combines a coherent set of new architecture-conscious techniques that are designed to work well together and achieves in-memory performance often one or two orders of magnitude higher than the existing approaches.
Integration of vectorwise with ingres
TLDR
The integration of the VectorWise technology with Ingres, some of the design decisions made as part of the integration project, and the problems that had to be solved in the process are described.
The INGRES Papers: Anatomy of a Relational Database System
TLDR
When you read more every page of this the ingres papers anatomy of a relational database system, what you will obtain is something great.
Positional update handling in column stores
TLDR
A new data structure for maintaining such positional updates to columnar databases, called the Positional Delta Tree (PDT), is described, and detailed algorithms for PDT/column merging, updating PDTs, and for using PDTs in transaction management are described.
Super-Scalar RAM-CPU Cache Compression
TLDR
This work proposes three new versatile compression schemes (PDICT, PFOR, and PFOR-DELTA) that are specifically designed to extract maximum IPC from modern CPUs and compares these algorithms with compression techniques used in (commercial) database and information retrieval systems.
Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS
TLDR
This paper analyzes the performance of concurrent (index) scan operations in both record (NSM/PAX) and column (DSM) disk storage models and proposes the Cooperative Scans framework that enhances performance in such scenarios by improving data-sharing between concurrent scans.
Integration of VectorWise with Ingres. SIGMOD Record
  • Integration of VectorWise with Ingres. SIGMOD Record
  • 2011
, and Peter Boncz . Integration of VectorWise with Ingres
  • SIGMOD Record The INGRES Papers : Anatomy of a Relational Database System .
  • 1986