• Corpus ID: 12556950

Universality of Serial Histograms

@inproceedings{Ioannidis1993UniversalityOS,
  title={Universality of Serial Histograms},
  author={Yannis E. Ioannidis},
  booktitle={VLDB},
  year={1993}
}
  • Y. Ioannidis
  • Published in VLDB 24 August 1993
  • Computer Science
Many current relational database systems use some form of histograms to approximate the frequency distribution of values in the attributes of relations and based on them estimate query result sizes and access plan costs. The errors that exist in the histogram approximations directly or transitively affect many estimates derived by the database system. We identify the class of serial histograms and demonstrate that they are optimal for reducing the query result size error for several classes of… 

Figures from this paper

Balancing histogram optimality and practicality for query result size estimation
TLDR
The overall conclusion is that the most effective approach is to focus on the class of histograms that accurately maintain the frequencies of a few attribute values and assume the uniform distribution for the rest, and choose for each relation the histogram in that class that is optimal for a self-join query.
An Overview of Cost-based Optimization of Queries with Aggregates
TLDR
This paper overviews the line of research on histograms at the Univ. of Wisconsin and presents several results, which eventually point towards a class of histograms that are practical, close to optimal, and effective in estimating sizes of query results, frequency distributions of attribute values inquery results, and even costs of accesses using secondary indices.
Improving Range Query Result Size Estimation Based on a New Optimal Histogram
TLDR
This paper proposes an efficient algorithm, called Compressed-V2, for accurate histogram constructions that will significantly contribute for helping to solve the problem of Multi-Query Optimization MQO resulting from queries interactions especially in Relational Data Warehouses RDW which represent the ideal environment in which complex OLAP queries interact with each other.
Improved histograms for selectivity estimation of range predicates
TLDR
A taxonomy of histograms that captures all previously proposed histogram types and indicates many new possibilities is provided, which introduces novel choices for several of the taxonomy dimensions, and derive new histograms types by combining choices in effective ways.
Query result size estimation using a novel histogram-like technique: the rectangular attribute cardinality map
  • B. Oommen, M. Thiyagarajah
  • Computer Science
    Proceedings. IDEAS'99. International Database Engineering and Applications Symposium (Cat. No.PR00265)
  • 1999
TLDR
This work introduces a new histogram-like approximation strategy called the Rectangular Attribute Cardinality Map (R-ACM), that aims to approximate the density of the underlying attribute values using the philosophies of numerical integration.
Optimal Histograms with Quality Guarantees
TLDR
Algorithms for computing optimal bucket boundaries in time proportional to the square of the number of distinct data values, for a broad class of optimality metrics and an enhancement to traditional histograms that allows us to provide quality guarantees on individual selectivity estimates are presented.
Structure choices for two-dimensional histogram construction
TLDR
This work experimentsally shows that the proposed methods for dealing with histogram structure choices lead to good quality histograms for a variety of histogram partitioning techniques and various types of data distributions.
Efficiently adapting graphical models for selectivity estimation
TLDR
By carefully using concepts from the field of graphical models, this work is able to factor the joint probability distribution over all the attributes in the database into small, usually two-dimensional distributions, without a significant loss in estimation accuracy.
...
...

References

SHOWING 1-10 OF 18 REFERENCES
Optimal histograms for limiting worst-case error propagation in the size of join results
TLDR
It is proved that for t-clique queries with a very large number of joins, h~gh-bzased h zstograms are always optimal, and to construct a histogram for the join attribute of a relation, the values in the attribute must first be sorted based on their frequency and then assigned into buckets according to the optimality results.
The Optimization of Queries in Relational Databases
A fully implemented system for optimizing and executing queries for relational databases is described. The system optimizes n-table, equi-join queries written in QUEL, the query language supported by
A detailed statistical model for relational query optimization
TLDR
An approach to estimate the cardinality of results of relational operations of select, project, and semijoin, using detailed database statistics computed from the instances of a relational database.
Accurate estimation of the number of tuples satisfying a condition
TLDR
A new method for estimating the number of tuples satisfying a condition of the type attribute rel constant, where rel is one of "=", ">", "<, "≥", "≤" , which gives highly accurate, yet easy to compute, estimates.
Equi-depth multidimensional histograms
TLDR
This paper presents an algorithm for generating equi-depth, multi-dimensional histograms and presents a main memory data structure for storing the histograms, and discusses two schemes for estimating the number of tuples that will be retrieved by a given query.
Statistical profile estimation in database systems
TLDR
This paper describes a model of a database of profile, relates this model to estimating the cost of database operations, and surveys methods of estimating profiles.
Implications of certain assumptions in database performance evauation
TLDR
This paper shows that assumptions of uniformity and independence of attribute values in a file, uniformity of queries, constant number of records per block, and random placement of qualifying records among the blocks of a file often result in predicting only an upper bound of the expected system cost.
Estimating block transfers and join sizes
TLDR
Estimates of the number of sequential and random block accesses required for retrieving a number of records of a file when the distribution of records in blocks of secondary storage is not uniform are provided.
Access path selection in a relational database management system
TLDR
This paper describes how System R chooses access paths for both simple (single relation) and complex queries (such as joins) given a user specification of desired data as a boolean expression of predicates.
Distribution Models Of Relations
  • T. Merrett, E. Otoo
  • Mathematics
    Fifth International Conference on Very Large Data Bases, 1979.
  • 1979
TLDR
It is shown how relations can be modelled in fast memory by a distribution of tuples in a multidimensional space using the result for the natural join to optimize the evaluation of an expression involving two joins.
...
...