Share This Author
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals
- J. Gray, S. Chaudhuri, H. Pirahesh
- Computer ScienceProceedings of the Twelfth International…
- 26 February 1996
This paper explains the cube and roll-up operators, shows how they fit in SQL, explains how users can define new aggregatefunctions for cubes, and discusses efficient techniques to compute the cube.
An overview of data warehousing and OLAP technology
An overview of data warehousing and OLAP technologies, with an emphasis on their new requirements, is provided, based on a tutorial presented at the VLDB Conference, 1996.
A Primitive Operator for Similarity Joins in Data Cleaning
- S. Chaudhuri, Venkatesh Ganti, R. Kaushik
- Computer Science22nd International Conference on Data Engineering…
- 3 April 2006
This paper proposes a new primitive operator which can be used as a foundation to implement similarity joins according to a variety of popular string similarity functions, and notions of similarity which go beyond textual similarity.
DBXplorer: a system for keyword-based search over relational databases
- S. Agrawal, S. Chaudhuri, Gautam Das
- Computer Science, EconomicsProceedings 18th International Conference on Data…
- 7 August 2002
DBXplorer, a system that enables keyword-based searches in relational databases using a commercial relational database and Web server and allows users to interact via a browser front-end is discussed.
An overview of business intelligence technology
BI technologies are essential to running today's businesses and this technology is going through sea changes.
Automated Selection of Materialized Views and Indexes in SQL Databases
This paper presents an end-to-end solution to the problem of selecting materialized views and indexes for SQL databases, and describes results of extensive experimental evaluation that demonstrate the effectiveness of the techniques.
An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server
Novel techniques that make it possible to build an industrial-strength tool for automating the choice of indexes in the physical design of a SQL database, and an iterative approach to handle the complexity arising from multicolumn indexes are described.
STHoles: a multidimensional workload-aware histogram
STHoles is introduced, a “workload-aware” histogram that allows bucket nesting to capture data regions with reasonably uniform tuple density and outperform the best multidimensional histogram techniques that require access to and processing of the full data sets during histogram construction.
Self-tuning histograms: building histograms without looking at data
The experimental results show that self-tuning histograms provide a low-cost alternative to traditional multi-dimensional histograms with little loss of accuracy for data distributions with low to moderate skew.
Robust and efficient fuzzy match for online data cleaning
A new similarity function is proposed which overcomes limitations of commonly used similarity functions, and an efficient fuzzy match algorithm is developed which can effectively clean an incoming tuple if it fails to match exactly with any tuple in the reference relation.