Optimizing relational algebra operations using generic partitioning discriminators and lazy products∗

Abstract

We show how to implement in-memory execution of the core relational algebra operations of projection, selection and cross-product efficiently, using discrimination-based joins and lazy products. We introduce the notion of (partitioning) discriminator, which partitions a list of values according to a specified equivalence relation on keys the values are associated with. We show how discriminators can be defined generically, purely functionally, and efficiently (worst-case linear time) on top of the array-based basic multiset discrimination algorithm of Cai and Paige (1995). Discriminators provide the basis for discrimination-based joins, a new technique for computing joins that requires neither hashing nor sorting. Discriminators also provide efficient implementations for eliminating duplicates, set union and set difference. We represent a cross-product lazily as a formal pair of the argument sets (relations). This allows the selection operation to recognize on the fly whenever it is applied to a cross-product, in which case it can choose an efficient discrimination-based equijoin implementation. The techniques subsume most of the optimization techniques based on relational algebra equalities, without need for a query preprocessing phase. They require no indexes and behave purely functionally. Full source code in Haskell extended with Generalized Algebraic Data Types (GADTS) is included. GADTs are used to represent sets (relations), projections, predicates and equivalence denotations in a type safe manner. It should be emphasized that the code is only intended for and applicable to operations on in-memory data; that is, in ∗This material is based upon work supported by the Danish National Science Foundation (FNU) under Project APPL and by the Danish National Advanced Technology Foundation under Project 3gERP.

Cite this paper

@inproceedings{Henglein2009OptimizingRA, title={Optimizing relational algebra operations using generic partitioning discriminators and lazy products∗}, author={Fritz Henglein}, year={2009} }