# Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

@article{Gray2004DataCA, title={Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals}, author={Jim Gray and Surajit Chaudhuri and Adam Bosworth and Andrew Layman and Don Reichart and Murali Venkatrao and Frank Pellow and Hamid Pirahesh}, journal={Data Mining and Knowledge Discovery}, year={2004}, volume={1}, pages={29-53} }

Data analysis applications typically aggregate data across manydimensions looking for anomalies or unusual patterns. The SQL aggregatefunctions and the GROUP BY operator produce zero-dimensional orone-dimensional aggregates. Applications need the N-dimensionalgeneralization of these operators. This paper defines that operator, calledthe data cube or simply cube. The cube operator generalizes the histogram,cross-tabulation, roll-up,drill-down, and sub-total constructs found in most reportâ€¦Â

## Figures, Tables, and Topics from this paper

## 2,158 Citations

Rolling Up Random Variables in Data Cubes

- Computer Science
- 2013

A novel technique is described for realizing a hierarchical structure in a data cube containing discrete random variables that construes roll-ups as parsimonious approximations to the joint distribution of the variables in terms of the aggregation structure of the cube.

theta-Constrained multi-dimensional aggregation

- Computer ScienceInf. Syst.
- 2011

This paper introduces @q@?constrained multi-dimensional aggregation (@q@)?MDA, which supports multi- dimensional OLAP queries with aggregation groups defined by inequalities, and presents algebraic transformation rules that demonstrate how the @q @?MDA interacts with other operators of a multi-set algebra.

Optimization of Percentage Cube Queries

- Computer ScienceEDBT/ICDT Workshops
- 2017

This work introduces the percentage cube, a generalized data cube that takes percentages as the target aggregated measure, and proposes a proposed SQL extension that is more abstract, more intuitive and faster than existing SQL functions to compute percentages on the cube.

y-Constrained multi-dimensional aggregation

- 2010

The SQL:2003 standard introduced window functions to enhance the analytical processing capabilities of SQL. The key concept of window functions is to sort the input relation and to ordering does notâ€¦

Novel Materialized View Selection in a Multidimensional Database

- Computer Science
- 2009

The technique is proposed not only will reduce the solution space by considering only the relevant elements of the multidimensional lattice and further will support multiple query execution there by reducing the time of result generation and is scalable.

Aggregations in SQL Using Data Sets For Data Mining Analysis

- Computer Science
- 2012

This work proposes simple, yet powerful, methods to generate SQL code to return aggregated columns in a horizontal tabular layout, returning a set of numbers instead of one number per row.

Efficient incremental maintenance of data cubes

- Computer ScienceVLDB
- 2006

This paper proposes an incremental maintenance method for data cubes that can maintain a data cube by using only (n âŒˆn/2âŒ‰) delta cuboids and shows the performance advantages of the method over the previous methods.

Data Cube Materialization with MR Cube and CM Sketch Approach

- 2015

Data cube computations plays an important role in data warehouse systems. Applications with multidimensional data analysis are looking for unusual patterns. Here aggregation of data is done acrossâ€¦

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

- Computer ScienceIEEE Transactions on Knowledge and Data Engineering
- 2012

This work proposes simple, yet powerful, methods to generate SQL code to return aggregated columns in a horizontal tabular layout, returning a set of numbers instead of one number per row.

ASSET Queries: A Set-Oriented and Column-Wise Approach to Modern OLAP

- Computer ScienceBIRTE
- 2009

The associated set (ASSET) concept is reviewed and its applicability in both continuous and traditional data settings is discussed and arguments for associated setsâ€™ analytical abilities and optimization opportunities are argued.

## References

SHOWING 1-10 OF 60 REFERENCES

On the Computation of Multidimensional Aggregates

- Computer ScienceVLDB
- 1996

This paper presents fast algorithms for computing a collection of group bys using sort-based and hashbased grouping methods with several .optimizations, like combining common operations across multiple groupbys, caching, and using pre-computed group-by8 for computing other groupbys.

Implementing data cubes efficiently

- Computer ScienceSIGMOD '96
- 1996

This paper investigates the issue of which cells (views) to materialize when it is too expensive to materialized all views, and presents greedy algorithms that work off this lattice and determine a good set of views to materializing.

Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies

- Computer ScienceVLDB
- 1996

Three strategies for estimating the storage blowup that will result from a proposed set of precomputations without actually computing them are proposed: one based on sampling, onebased on mathematical approximation, and one based upon probabilistic counting.

Using the New DB2: IBM's Object-Relational Database System

- Computer Science
- 1996

Using the New DB2 presents an overview of the basic features of DB2 Version 2, including historical notes on the development of SQL, and offers a comprehensive explanation of the advanced features of the system, including recursive queries, constraints, triggers, user-defined types and functions, stored procedures, and client-server applications.

Query evaluation techniques for large databases

- Computer ScienceCSUR
- 1993

This survey describes a wide array of practical query evaluation techniques for both relational and postrelational database systems, including iterative execution of complex query evaluation plans, the duality of sort- and hash-based set-matching algorithms, types of parallel query execution and their implementation, and special operators for emerging database application domains.

Understanding the New SQL: A Complete Guide

- Computer Science
- 1993

This chapter discusses the design of SQL-92 Databases, the SQL Standardization Process, and the creation and manipulation of Table Creation and Data Manipulation.

The benchmark handbook for database and transaction processing systems

- Computer Science
- 1991

Transaction Processing Performance Council (TPC) is a non-profit to define transaction processing and database benchmarks and to disseminate TPC benchmarks are used in evaluating the performance ofâ€¦

Benchmark Handbook: For Database and Transaction Processing Systems

- Computer Science
- 1992

The handbook provides the tools to evaluate different systems, different software products on a single machine, and different machines within a single product family.

An Introduction to Database Systems

- Computer Science
- 1975

Readers of this book will gain a strong working knowledge of the overall structure, concepts, and objectives of database systems and will become familiar with the theoretical principles underlying the construction of such systems.

Introduction to Database Systems

- Computer Science
- 2005

This chapter explains the development of a Relational Model for SQL and some examples show how the model changed over time from simple to complex to elegant and efficient.