Building Efficient Query Engines in a High-Level Language

@article{Shaikhha2018BuildingEQ,
  title={Building Efficient Query Engines in a High-Level Language},
  author={Amir Shaikhha and Yannis Klonatos and Christoph E. Koch},
  journal={ArXiv},
  year={2018},
  volume={abs/1612.05566}
}
Abstraction without regret refers to the vision of using high-level programming languages for systems development without experiencing a negative impact on performance. A database system designed according to this vision offers both increased productivity and high performance instead of sacrificing the former for the latter as is the case with existing, monolithic implementations that are hard to maintain and extend. In this article, we realize this vision in the domain of analytical query… 

Figures and Tables from this paper

A SQL to C compiler in 500 lines of code
Abstract We present the design and implementation of a SQL query processor that outperforms existing database systems and is written in just about 500 lines of Scala code – a convincing case study
Tidy Tuples and Flying Start: fast compilation and fast execution of relational queries in Umbra
TLDR
A code generation framework that establishes abstractions to manage complexity, yet generates code in a single fast pass, and a new compiler backend that is optimized for minimal compile time, and simultaneously, yields superior execution performance to competing approaches, e.g., Volcano-style or bytecode interpretation.
Fine-Tuning Data Structures for Analytical Query Processing
TLDR
A novel low-level intermediate language that can express the algorithms behind various query processing paradigms such as classical joins, groupjoin, and in-database machine learning engines is introduced.
How to Architect a Query Compiler
TLDR
This paper proposes to use a stack of multiple DSLs on different levels of abstraction with lowering in multiple steps to make query compilers easier to build and extend, ultimately allowing us to create more convincing and sustainable compiler-based data management systems.
On supporting compilation in spatial query engines: (vision paper)
TLDR
LB2-Spatial is described; a prototype for a fully compiled spatial query engine that employs generative and multi-stage programming to realize query compilation and sketches potential avenues for supporting spatial query compilation in Postgres/ PostGIS; a traditional RDBMS and Spark/ Spark SQL; a main-memory cluster computing framework.
Voodoo - A Vector Algebra for Portable Database Performance on Modern Hardware
TLDR
This work uses Voodoo, a declarative intermediate algebra that abstracts the detailed architectural properties of the hardware, such as multi- or many-core architectures, caches and SIMD registers, without losing the ability to generate highly tuned code, to build an alternative backend for MonetDB, a popular open-source in-memory database.
UniAD: A Unified Ad Hoc Data Processing System
TLDR
This article presents UniAD, which stands for Unified execution for Ad hoc Data processing, a system designed to simplify the programming of data processing tasks and provide efficient execution for user programs, and proposes a novel intermediate representation, called UniIR, which utilizes a simple and expressive mechanism HOQ to describe the operations performed in programs.
Efficient Compilation of Regular Path Queries
TLDR
This work applies ad hoc code generation to regular path queries (RPQs), an advanced query type in declarative graph query languages, and proposes COAT, an embedded domain specific language (EDSL) in C++ to improve accessibility of code generation by simplifying the interaction with compiler APIs.
A program optimization for automatic database result caching
TLDR
This paper presents a compiler optimization that automatically adds sound SQL caching to Web applications coded in the Ur/Web domain-specific functional language, with no modifications required to source code.
Compilation and Code Optimization for Data Analytics
TLDR
The vision of abstraction without regret argues that it is possible to use high-level languages for building performance-critical systems that allow for both productivity and high performance, instead of trading off the former for the latter.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 131 REFERENCES
Code Generation for Efficient Query Processing in Managed Runtimes
TLDR
This work makes important first steps towards a future where data processing applications will commonly run on machines that can store their entire datasets in-memory, and will be written in a single programming language employing language-integrated query and imdb-inspired runtimes to provide transparent and highly efficient querying.
Abstraction Without Regret in Database Systems Building: a Manifesto
TLDR
It is argued that compilers can be competitive with and outperform human experts at low-level database systems programming and recent progress makes their creation eminently feasible.
Optimizing database-backed applications with query synthesis
TLDR
This paper presents QBS, a system that automatically transforms fragments of application logic into SQL queries, and demonstrates that this approach can convert a variety of imperative constructs into relational specifications and significantly improve application performance asymptotically by orders of magnitude.
Steno: automatic optimization of declarative queries
TLDR
Steno is developed, which uses a combination of novel and well-known techniques to generate code for declarative queries that is almost as efficient as hand-optimized code.
How to Architect a Query Compiler
TLDR
This paper proposes to use a stack of multiple DSLs on different levels of abstraction with lowering in multiple steps to make query compilers easier to build and extend, ultimately allowing us to create more convincing and sustainable compiler-based data management systems.
Generating code for holistic query evaluation
TLDR
The results show that HIQUE satisfies its design objectives, while its efficiency surpasses that of both well-established and currently-emerging query processing techniques.
Lightweight Modular Staging and Embedded Compilers: Abstraction without Regret for High-Level High-Performance Programming
TLDR
This thesis proposes a hybrid design: Integrate compilers into programs so that programs can take control of the translation process, but rely on libraries of common compilerfunctionality for help.
Leveraging .NET meta-programming components from F#: integrated queries and interoperable heterogeneous execution
TLDR
This paper explores the use of a modest meta-programming extension to F# to access and leverage the functionality of LINQ and other components, and demonstrates an implementation of language integrated SQL queries using the LINQ/SQLMetal libraries.
Efficiently Compiling Efficient Query Plans for Modern Hardware
TLDR
This work presents a novel compilation strategy that translates a query into compact and efficient machine code using the LLVM compiler framework and integrates these techniques into the HyPer main memory database system and shows that this results in excellent query performance while requiring only modest compilation time.
DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language
TLDR
It is shown that excellent absolute performance can be attained--a general-purpose sort of 1012 Bytes of data executes in 319 seconds on a 240-computer, 960- disk cluster--as well as demonstrating near-linear scaling of execution time on representative applications as the authors vary the number of computers used for a job.
...
1
2
3
4
5
...