Automated statistics collection in action


If presented with inaccurate statistics, even the most sophisticated query optimizers make mistakes. They may wrongly estimate the output cardinality of a certain operation and thus make sub-optimal plan choices based on that cardinality. Maintaining accurate statistics is hard, both because each table may need a specifically parameterized set of statistics and because statistics get outdated as the database changes. Automated Statistic Collection (ASC) is a new component in IBM DB2 UDB that, without any DBA intervention, observes and analyzes the effects of faulty statistics and, in response, it triggers actions that continuously repair the latter. In this demonstration, we will show how ASC works to alleviate the DBA from the task of maintaining fresh, accurate statistics in several challenging scenarios. ASC is able to reconfigure the statistics collection parameters (e.g, number of frequent values for a column, or correlations between certain column pairs) on a per-table basis. ASC can also detect and guard against outdated statistics caused by high updates/inserts/deletes rates in volatile, dynamic databases. We will also show how ASC works from the inside: from how cardinality mis-estimations are introduced in different kind of operators, to how this error is propagated to later operations in the plan, to how this influences plan choices inside the optimizer.

DOI: 10.1145/1066157.1066293

Extracted Key Phrases

4 Figures and Tables

Cite this paper

@inproceedings{Haas2005AutomatedSC, title={Automated statistics collection in action}, author={Peter J. Haas and Mokhtar Kandil and Alberto Lerner and Volker Markl and Ivan Popivanov and Vijayshankar Raman and Daniel C. Zilio}, booktitle={SIGMOD Conference}, year={2005} }