Implementing data cubes efficiently

@inproceedings{Harinarayan1996ImplementingDC,
  title={Implementing data cubes efficiently},
  author={Venky Harinarayan and Anand Rajaraman and Jeffrey D. Ullman},
  booktitle={SIGMOD '96},
  year={1996}
}
Decision support applications involve complex queries on very large databases. Since response times should be small, query optimization is critical. Users typically view the data as multidimensional data cubes. Each cell of the data cube is a view consisting of an aggregation of interest, like total sales. The values of many of these cells are dependent on the values of other cells in the data cube. A common and powerful query optimization technique is to materialize some or all of these cells… 
Compressed data cubes for OLAP aggregate query approximation on continuous dimensions
TLDR
A new compressed representation of the data cube is proposed that drastically reduces storage requirements, does not require the discretization hierarchy along each query dimension to be fixed beforehand and treats each dimension as a potential target measure and supports multiple aggregation functions without additional storage costs.
Optimizing multiple dimensional queries simultaneously in multidimensional databases
TLDR
This paper considers in detail two cases of the problem in which all the queries are either hash- based star joins or index-based star joins only and presents the only development of polynomial algorithms for the first two cases which are able to deliver plans with deterministic performance guarantees in terms of the qualities of the plans generated.
Answering multidimensional queries on cubes using other cubes
  • D. Theodoratos, T. Sellis
  • Computer Science
    Proceedings. 12th International Conference on Scientific and Statistica Database Management
  • 2000
TLDR
This paper provides a simple data model for MD databases, and a simple algebraic MD query language that permit the modeling of the principal OLAP operations, and provides instance independent expressions that compute an MD query on a cube from derived cubes.
Range queries in dynamic OLAP data cubes
TLDR
A new algorithm is provided which achieves constant time per range sum query while constraining each update cost within O(nd/2), where d is the number of dimensions of the data cube and n is thenumber of distinct values of the domain at each dimension.
Efficient Materialized View Selection for Multi-Dimensional Data Cube Models
TLDR
The authors in this paper present a refined greedy selection approach using forward references to give better materialized view selection that works on lattice framework of data that is capable enough to show inter dependencies of data.
An Optimization Problem in Data Cube System Design
TLDR
Approximate algorithms Greedy Removing and 2-Greedy Merging are proposed and they show that their approach is both effective and efficient in the data cube system design.
Efficient Evaluation of Sparse Data Cubes
TLDR
A new dynamic data structure called SST (Sparse Statistics Trees) and a novel, interactive, and fast cube evaluation algorithm called CUPS (Cubing by Pruning SST), which is especially well suitable for computing aggregates in cubes whose data sets are sparse.
Cost effective storage space for data cubes
TLDR
The relation between the number of data cube views and the space limit expressed as a percentage of the fully materialized data cube size and a multiple of the base view size is analysed and it is found that the allocation of large space for views materialization is not cost effective.
Cost effective storage space for data cubes
TLDR
The relation between the number of data cube views and the space limit expressed as a percentage of the fully materialized data cube size and a multiple of the base view size is analysed and it is found that the allocation of large space for views materialization is not cost effective.
Optimization in Data Cube System Design
The design of an OLAP system for supporting real-time queries is one of the major research issues. One approach is to use data cubes, which are materialized precomputed multidimensional views of data
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 17 REFERENCES
Index selection for OLAP
TLDR
The authors give algorithms that automate the selection of summary tables and indexes, and present a family of algorithms of increasing time complexities, and prove strong performance bounds for them.
Including Group-By in Query Optimization
TLDR
It is shown that the extent of improvement in the quality of plans is significant with only a modest increase in optimization cost, and the technique also applies to optimization of Select Distinct queries by pushing down duplicate elimination in a cost-based fashion.
Query evaluation techniques for large databases
TLDR
This survey describes a wide array of practical query evaluation techniques for both relational and postrelational database systems, including iterative execution of complex query evaluation plans, the duality of sort- and hash-based set-matching algorithms, types of parallel query execution and their implementation, and special operators for emerging database application domains.
Aggregate-Query Processing in Data Warehousing Environments
TLDR
Generalized projections are introduced, that capture aggregations, groupbys, duplicate-eliminating projections (distinct and duplicate-preserving projections in a common unified framework), and powerful query rewrite rules for aggregate queries are developed that unify and extend rewrite rules previously known in the literature.
Multi-table joins through bitmapped join indices
TLDR
This technical note shows how to combine some well-known techniques to create a method that will efficiently execute common multi-table joins, and outlines realistic examples where the combination of these techniques yields substantial performance improvements over alternative, more traditional query evaluation plans.
Sampling-Based Estimation of the Number of Distinct Values of an Attribute
TLDR
This appears to be the first extensive comparison of distinct-value estimators in either the database or statistical literature, and is certainly the first to use highlyskewed data of the sort frequently encountered in database applications.
A threshold of ln n for approximating set cover (preliminary version)
  • U. Feige
  • Mathematics, Computer Science
    STOC '96
  • 1996
We prove that (] – o(]))lnn is a threshold below which set, cover cannot be approximated efficiently, unless NP has slightly superpolynornial time algorithms. This closes tlw gap (up to low order
A threshold of ln n for approximating set cover
  • U. Feige
  • Mathematics, Computer Science
    JACM
  • 1998
TLDR
It is proved that (1 - <?Pub Fmt italic>o<?Pub FMT /italic>(1) ln n setcover is a threshold below which setcover cannot be approximated efficiently, unless NP has slightlysuperpolynomial time algorithms.
Cheklu-i
  • Cheklu-i
  • 1996
Ulhnan. Index Selection for OLAP. SIIhmitted for publication, At http://clb. Stanford. eclu/ pub/hgupta/1996 /CubeIndex.ps
  • 1996
...
1
2
...