• Corpus ID: 11741471

Multi Query Optimization in GLADE

  title={Multi Query Optimization in GLADE},
  author={Abdur Rafay},
  • Abdur Rafay
  • Published 16 August 2016
  • Computer Science
  • ArXiv
SQL-on-Hadoop systems, query optimization, data distribution over multiple nodes and parallelization techniques are few of the areas under extreme research these days. Big names like Amazon, Google, Microsoft and many more are working on implementing systems for faster access of data from multiple nodes, reducing data mobility and increasing the parallelization. Customer’s queries are retrieved and reviewed by the database systems in an efficient way in the least amount of time by the… 



Shared Workload Optimization

This paper presents a first algorithm capable of optimizing the execution of entire workloads by deriving a global executing plan for all the queries in the system and evaluates the optimizer over the TPC-W and theTPC-H benchmarks.

MQJoin: Efficient Shared Execution of Main-Memory Joins

For a TPC-H based workload, it is shown that MQJoin provides 2--5x higher throughput with significantly more stable response times and is able to efficiently handle larger workloads regardless of the schema by exploiting more sharing opportunities.

Scalable multi-query optimization for exploratory queries over federated scientific databases

This work attacks the optimization problem for exploratory queries by proposing several multi-query optimization algorithms that compute a global evaluation plan while minimizing the total communication cost, a key bottleneck in distributed settings.

The DataPath system: a data-centric analytic processing engine for large data warehouses

In DataPath, queries do not request data, and data are automatically pushed onto processors, where they are then processed by any interested computation, making for a very lean and fast database system.

GLADE: a scalable framework for efficient analytics

GLADE consists of a simple user-interface to define Generalized Linear Aggregates (GLA), the fundamental abstraction at the core of GLADE, and a distributed runtime environment that executes GLAs by using parallelism extensively.

Selectivity Estimation and Query Optimization in Large Databases with Highly Skewed Distribution of Column Values

A series of new selectivity estimation methods that work well with highly skewed v&e distributions are defined and then compared to currently used methods such as uniform approximation and histograms.

GLADE: big data analytics made easy

We present GLADE, a scalable distributed system for large scale data analytics. GLADE takes analytical functions expressed through the User-Defined Aggregate (UDA) interface and executes them