Universality of Serial Histograms
@inproceedings{Ioannidis1993UniversalityOS, title={Universality of Serial Histograms}, author={Yannis E. Ioannidis}, booktitle={VLDB}, year={1993} }
Many current relational database systems use some form of histograms to approximate the frequency distribution of values in the attributes of relations and based on them estimate query result sizes and access plan costs. The errors that exist in the histogram approximations directly or transitively affect many estimates derived by the database system. We identify the class of serial histograms and demonstrate that they are optimal for reducing the query result size error for several classes of…
128 Citations
Balancing histogram optimality and practicality for query result size estimation
- Computer ScienceSIGMOD '95
- 1995
The overall conclusion is that the most effective approach is to focus on the class of histograms that accurately maintain the frequencies of a few attribute values and assume the uniform distribution for the rest, and choose for each relation the histogram in that class that is optimal for a self-join query.
An Overview of Cost-based Optimization of Queries with Aggregates
- Computer ScienceIEEE Data Eng. Bull.
- 1995
This paper overviews the line of research on histograms at the Univ. of Wisconsin and presents several results, which eventually point towards a class of histograms that are practical, close to optimal, and effective in estimating sizes of query results, frequency distributions of attribute values inquery results, and even costs of accesses using secondary indices.
Improving Range Query Result Size Estimation Based on a New Optimal Histogram
- Computer ScienceFQAS
- 2013
This paper proposes an efficient algorithm, called Compressed-V2, for accurate histogram constructions that will significantly contribute for helping to solve the problem of Multi-Query Optimization MQO resulting from queries interactions especially in Relational Data Warehouses RDW which represent the ideal environment in which complex OLAP queries interact with each other.
Improved histograms for selectivity estimation of range predicates
- Computer ScienceSIGMOD '96
- 1996
A taxonomy of histograms that captures all previously proposed histogram types and indicates many new possibilities is provided, which introduces novel choices for several of the taxonomy dimensions, and derive new histograms types by combining choices in effective ways.
Query result size estimation using a novel histogram-like technique: the rectangular attribute cardinality map
- Computer ScienceProceedings. IDEAS'99. International Database Engineering and Applications Symposium (Cat. No.PR00265)
- 1999
This work introduces a new histogram-like approximation strategy called the Rectangular Attribute Cardinality Map (R-ACM), that aims to approximate the density of the underlying attribute values using the philosophies of numerical integration.
Optimal Histograms with Quality Guarantees
- Computer ScienceVLDB
- 1998
Algorithms for computing optimal bucket boundaries in time proportional to the square of the number of distinct data values, for a broad class of optimality metrics and an enhancement to traditional histograms that allows us to provide quality guarantees on individual selectivity estimates are presented.
Structure choices for two-dimensional histogram construction
- Computer ScienceCASCON
- 2004
This work experimentsally shows that the proposed methods for dealing with histogram structure choices lead to good quality histograms for a variety of histogram partitioning techniques and various types of data distributions.
Efficiently adapting graphical models for selectivity estimation
- Computer ScienceThe VLDB Journal
- 2012
By carefully using concepts from the field of graphical models, this work is able to factor the joint probability distribution over all the attributes in the database into small, usually two-dimensional distributions, without a significant loss in estimation accuracy.
References
SHOWING 1-10 OF 18 REFERENCES
Optimal histograms for limiting worst-case error propagation in the size of join results
- Computer ScienceTODS
- 1993
It is proved that for t-clique queries with a very large number of joins, h~gh-bzased h zstograms are always optimal, and to construct a histogram for the join attribute of a relation, the values in the attribute must first be sorted based on their frequency and then assigned into buckets according to the optimality results.
The Optimization of Queries in Relational Databases
- Computer Science
- 1980
A fully implemented system for optimizing and executing queries for relational databases is described. The system optimizes n-table, equi-join queries written in QUEL, the query language supported by…
A detailed statistical model for relational query optimization
- Computer ScienceACM '85
- 1985
An approach to estimate the cardinality of results of relational operations of select, project, and semijoin, using detailed database statistics computed from the instances of a relational database.
Accurate estimation of the number of tuples satisfying a condition
- Computer ScienceSIGMOD '84
- 1984
A new method for estimating the number of tuples satisfying a condition of the type attribute rel constant, where rel is one of "=", ">", "<, "≥", "≤" , which gives highly accurate, yet easy to compute, estimates.
Equi-depth multidimensional histograms
- Computer ScienceSIGMOD '88
- 1988
This paper presents an algorithm for generating equi-depth, multi-dimensional histograms and presents a main memory data structure for storing the histograms, and discusses two schemes for estimating the number of tuples that will be retrieved by a given query.
Statistical profile estimation in database systems
- Computer ScienceCSUR
- 1988
This paper describes a model of a database of profile, relates this model to estimating the cost of database operations, and surveys methods of estimating profiles.
Implications of certain assumptions in database performance evauation
- Computer ScienceTODS
- 1984
This paper shows that assumptions of uniformity and independence of attribute values in a file, uniformity of queries, constant number of records per block, and random placement of qualifying records among the blocks of a file often result in predicting only an upper bound of the expected system cost.
Estimating block transfers and join sizes
- Computer Science, MathematicsSIGMOD '83
- 1983
Estimates of the number of sequential and random block accesses required for retrieving a number of records of a file when the distribution of records in blocks of secondary storage is not uniform are provided.
Access path selection in a relational database management system
- Computer ScienceSIGMOD '79
- 1979
This paper describes how System R chooses access paths for both simple (single relation) and complex queries (such as joins) given a user specification of desired data as a boolean expression of predicates.
Distribution Models Of Relations
- MathematicsFifth International Conference on Very Large Data Bases, 1979.
- 1979
It is shown how relations can be modelled in fast memory by a distribution of tuples in a multidimensional space using the result for the natural join to optimize the evaluation of an expression involving two joins.