Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

@article{Gray2004DataCA,
  title={Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals},
  author={Jim Gray and Surajit Chaudhuri and Adam Bosworth and Andrew Layman and Don Reichart and Murali Venkatrao and Frank Pellow and Hamid Pirahesh},
  journal={Data Mining and Knowledge Discovery},
  year={2004},
  volume={1},
  pages={29-53}
}
Data analysis applications typically aggregate data across manydimensions looking for anomalies or unusual patterns. The SQL aggregatefunctions and the GROUP BY operator produce zero-dimensional orone-dimensional aggregates. Applications need the N-dimensionalgeneralization of these operators. This paper defines that operator, calledthe data cube or simply cube. The cube operator generalizes the histogram,cross-tabulation, roll-up,drill-down, and sub-total constructs found in most report… 
Rolling Up Random Variables in Data Cubes
TLDR
A novel technique is described for realizing a hierarchical structure in a data cube containing discrete random variables that construes roll-ups as parsimonious approximations to the joint distribution of the variables in terms of the aggregation structure of the cube.
theta-Constrained multi-dimensional aggregation
TLDR
This paper introduces @q@?constrained multi-dimensional aggregation (@q@)?MDA, which supports multi- dimensional OLAP queries with aggregation groups defined by inequalities, and presents algebraic transformation rules that demonstrate how the @q @?MDA interacts with other operators of a multi-set algebra.
Optimization of Percentage Cube Queries
TLDR
This work introduces the percentage cube, a generalized data cube that takes percentages as the target aggregated measure, and proposes a proposed SQL extension that is more abstract, more intuitive and faster than existing SQL functions to compute percentages on the cube.
y-Constrained multi-dimensional aggregation
The SQL:2003 standard introduced window functions to enhance the analytical processing capabilities of SQL. The key concept of window functions is to sort the input relation and to ordering does not
Novel Materialized View Selection in a Multidimensional Database
TLDR
The technique is proposed not only will reduce the solution space by considering only the relevant elements of the multidimensional lattice and further will support multiple query execution there by reducing the time of result generation and is scalable.
Aggregations in SQL Using Data Sets For Data Mining Analysis
TLDR
This work proposes simple, yet powerful, methods to generate SQL code to return aggregated columns in a horizontal tabular layout, returning a set of numbers instead of one number per row.
Efficient incremental maintenance of data cubes
TLDR
This paper proposes an incremental maintenance method for data cubes that can maintain a data cube by using only (n ⌈n/2⌉) delta cuboids and shows the performance advantages of the method over the previous methods.
Data Cube Materialization with MR Cube and CM Sketch Approach
Data cube computations plays an important role in data warehouse systems. Applications with multidimensional data analysis are looking for unusual patterns. Here aggregation of data is done across
Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
TLDR
This work proposes simple, yet powerful, methods to generate SQL code to return aggregated columns in a horizontal tabular layout, returning a set of numbers instead of one number per row.
ASSET Queries: A Set-Oriented and Column-Wise Approach to Modern OLAP
TLDR
The associated set (ASSET) concept is reviewed and its applicability in both continuous and traditional data settings is discussed and arguments for associated sets’ analytical abilities and optimization opportunities are argued.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 60 REFERENCES
On the Computation of Multidimensional Aggregates
TLDR
This paper presents fast algorithms for computing a collection of group bys using sort-based and hashbased grouping methods with several .optimizations, like combining common operations across multiple groupbys, caching, and using pre-computed group-by8 for computing other groupbys.
Implementing data cubes efficiently
TLDR
This paper investigates the issue of which cells (views) to materialize when it is too expensive to materialized all views, and presents greedy algorithms that work off this lattice and determine a good set of views to materializing.
Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies
TLDR
Three strategies for estimating the storage blowup that will result from a proposed set of precomputations without actually computing them are proposed: one based on sampling, onebased on mathematical approximation, and one based upon probabilistic counting.
Using the New DB2: IBM's Object-Relational Database System
TLDR
Using the New DB2 presents an overview of the basic features of DB2 Version 2, including historical notes on the development of SQL, and offers a comprehensive explanation of the advanced features of the system, including recursive queries, constraints, triggers, user-defined types and functions, stored procedures, and client-server applications.
Query evaluation techniques for large databases
TLDR
This survey describes a wide array of practical query evaluation techniques for both relational and postrelational database systems, including iterative execution of complex query evaluation plans, the duality of sort- and hash-based set-matching algorithms, types of parallel query execution and their implementation, and special operators for emerging database application domains.
Understanding the New SQL: A Complete Guide
TLDR
This chapter discusses the design of SQL-92 Databases, the SQL Standardization Process, and the creation and manipulation of Table Creation and Data Manipulation.
The benchmark handbook for database and transaction processing systems
Transaction Processing Performance Council (TPC) is a non-profit to define transaction processing and database benchmarks and to disseminate TPC benchmarks are used in evaluating the performance of
Benchmark Handbook: For Database and Transaction Processing Systems
TLDR
The handbook provides the tools to evaluate different systems, different software products on a single machine, and different machines within a single product family.
An Introduction to Database Systems
TLDR
Readers of this book will gain a strong working knowledge of the overall structure, concepts, and objectives of database systems and will become familiar with the theoretical principles underlying the construction of such systems.
Introduction to Database Systems
TLDR
This chapter explains the development of a Relational Model for SQL and some examples show how the model changed over time from simple to complex to elegant and efficient.
...
1
2
3
4
5
...