Query Sampling in DB2 Universal Database


Executing ad hoc queries against large databases can be prohibitively expensive. Exploratory analysis of data may not require exact answers to queries, however: results based on sampling the data are often satisfactory. Supporting sampling as a primitive SQL operator turns out to be difficult because sampling does not commute with many SQL operators.In this paper, we describe an implementation in IBM&#174; DB2&#174; Universal Database (UDB) of a sampling operator that commutes with <i>some</i> SQL operators. As a result, the query with the sampling operator always returns a random sample of the answers and in many cases runs faster than it would have without such an operator.

DOI: 10.1145/1007568.1007664

Extracted Key Phrases

Cite this paper

@inproceedings{Gryz2004QuerySI, title={Query Sampling in DB2 Universal Database}, author={Jarek Gryz and Junjie Guo and Linqi Liu and Calisto Zuzarte}, booktitle={SIGMOD Conference}, year={2004} }