• Corpus ID: 936038

An Overview of Cost-based Optimization of Queries with Aggregates

  title={An Overview of Cost-based Optimization of Queries with Aggregates},
  author={Surajit Chaudhuri and Kyuseok Shim},
  journal={IEEE Data Eng. Bull.},
Many current database systems use some form of histograms to approximate the frequency distribution of values in the attributes of relations and based on them estimate some query result sizes and access plan costs. In this paper, we overview the line of research on histograms that we have followed at the Univ. of Wisconsin. Our goal has been to identify classes of histograms that combine three features in most realistic cases: (i) they produce estimates with small errors, (ii) they are… 

Figures from this paper

An overview of query optimization in relational systems
The goal of this article is not to be comprehensive, but rather to explain the foundations and present samplings of significant work in this area of query optimization.
Divide and aggregate: caching multidimensional objects
This work proposes a query based aggregate cache of ‘multidimensional objects’ which allows the combination of several aggregates to derive a single query using ‘setderivability’ and shows average cost reductions of over 50% while spending only 10% of additional storage for summary data.
Fighting Redundancy in SQL : the For-Loop Approach ∗
It is shown that more than one pass over the base relations in the database is necessary in order to compute the answer for such queries with traditional optimization techniques, however, this is not strictly necessary.
Techniques for improving efficiency and scalability for the integration of information retrieval and databases
Experimental results showed that the efficiency and scalability of an IR+DB prototype have been improved, while the system can handle queries efficiently on considerable large data sets for a number of IR tasks.
Congressional samples for approximate answering of group-by queries
A one pass algorithm for constructing a congressional sample is presented and this technique is used to also incrementally maintain the sample up-to-date without accessing the base relation, which demonstrates the efficacy of the techniques proposed.
SPROUT: Lazy vs. Eager Query Plans for Tuple-Independent Probabilistic Databases
An efficient secondary-storage operator for exact computation of queries on tuple-independent probabilistic databases, which is semantically equivalent to a sequence of aggregations and can be naturally integrated into existing relational query plans.
A Redundancy-Based Optimization Approach for Aggregation in Multidimensional Scientific and Atatistical Databases
An optimization approach in the SSDB domain which is based on the re-use of materialized results offormer queries to process aggregate queries along a classi$cation hierarchy is described.
Query Optimization in Heterogeneous Distributed Databases
Heterogeneous Distributed database management systems (DDBMS) are amongst the most important and successful software developments in this decade because a large number of parameters affect the performance of distributed queries.
Analysis of Query Optimization Techniques in Databases
Abdullah Dilsat : Query Optimization in Distributed Databases. Report, Middle East Technical University, December 2003. Aho, A. V. , Sagiv,Y. and J. D. Ullman: Efficient optimization of a class of
Tuning the SQL Query in order to Reduce Time Consumption
This query optimization gives the High performance of the system and less stress on the database when data transmission occurs and the efficient usage of database engine and lesser memory consumed.


Balancing histogram optimality and practicality for query result size estimation
The overall conclusion is that the most effective approach is to focus on the class of histograms that accurately maintain the frequencies of a few attribute values and assume the uniform distribution for the rest, and choose for each relation the histogram in that class that is optimal for a self-join query.
Optimal histograms for limiting worst-case error propagation in the size of join results
It is proved that for t-clique queries with a very large number of joins, h~gh-bzased h zstograms are always optimal, and to construct a histogram for the join attribute of a relation, the values in the attribute must first be sorted based on their frequency and then assigned into buckets according to the optimality results.
Universality of Serial Histograms
Serial histograms are identified and shown to be optimal for arbitrary tree equality-join queries when the query result size is maximized, whether or not the attribute independence assumption holds, and when the queries results size is minimized and the attribute independent assumption holds.
On the propagation of errors in the size of join results
This work presents a formal framework based on which the principles of this error propagation can be studied and obtains several analytic results on how the error propagates in general, as well as in the extreme and average cases.
Including Group-By in Query Optimization
It is shown that the extent of improvement in the quality of plans is significant with only a modest increase in optimization cost, and the technique also applies to optimization of Select Distinct queries by pushing down duplicate elimination in a cost-based fashion.
The Optimization of Queries in Relational Databases
A fully implemented system for optimizing and executing queries for relational databases is described. The system optimizes n-table, equi-join queries written in QUEL, the query language supported by
Measuring the Complexity of Join Enumeration in Query Optimization
This paper describes and measures the performance of the Starburst join enumerator, which can parameterically adjust for each query the space of join sequences that arc evaluated by the optimizer to allow or disallow composite tables as the inner operand of a join.
Accurate estimation of the number of tuples satisfying a condition
A new method for estimating the number of tuples satisfying a condition of the type attribute rel constant, where rel is one of "=", ">", "<, "≥", "≤" , which gives highly accurate, yet easy to compute, estimates.
The Set Query Benchmark
  • P. O'Neil
  • Computer Science
    The Benchmark Handbook
  • 1991
The Set Query benchmark chooses a list of "basic" set queries from a review of three major types of strategic data applications: document search, direct marketing, and decision support, and results are presented for two leading database products used in large scale operations.
On the optimal nesting order for computing N-relational joins
This paper proposes a data structure whereby the number of page fetches required for query evaluation is substantially reduced and derives a formula for the expected number ofpage fetches, and presents an efficient algorithm for finding an optimal nesting order.