• Publications
  • Influence
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals
This paper explains the cube and roll-up operators, shows how they fit in SQL, explains how users can define new aggregatefunctions for cubes, and discusses efficient techniques to compute the cube. Expand
An overview of data warehousing and OLAP technology
An overview of data warehousing and OLAP technologies, with an emphasis on their new requirements, is provided, based on a tutorial presented at the VLDB Conference, 1996. Expand
A Primitive Operator for Similarity Joins in Data Cleaning
This paper proposes a new primitive operator which can be used as a foundation to implement similarity joins according to a variety of popular string similarity functions, and notions of similarity which go beyond textual similarity. Expand
DBXplorer: a system for keyword-based search over relational databases
DBXplorer, a system that enables keyword-based searches in relational databases using a commercial relational database and Web server and allows users to interact via a browser front-end is discussed. Expand
An overview of business intelligence technology
BI technologies are essential to running today's businesses and this technology is going through sea changes.
Automated Selection of Materialized Views and Indexes in SQL Databases
This paper presents an end-to-end solution to the problem of selecting materialized views and indexes for SQL databases, and describes results of extensive experimental evaluation that demonstrate the effectiveness of the techniques. Expand
An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server
Novel techniques that make it possible to build an industrial-strength tool for automating the choice of indexes in the physical design of a SQL database, and an iterative approach to handle the complexity arising from multicolumn indexes are described. Expand
STHoles: a multidimensional workload-aware histogram
STHoles is introduced, a “workload-aware” histogram that allows bucket nesting to capture data regions with reasonably uniform tuple density and outperform the best multidimensional histogram techniques that require access to and processing of the full data sets during histogram construction. Expand
Self-tuning histograms: building histograms without looking at data
The experimental results show that self-tuning histograms provide a low-cost alternative to traditional multi-dimensional histograms with little loss of accuracy for data distributions with low to moderate skew. Expand
Robust and efficient fuzzy match for online data cleaning
A new similarity function is proposed which overcomes limitations of commonly used similarity functions, and an efficient fuzzy match algorithm is developed which can effectively clean an incoming tuple if it fails to match exactly with any tuple in the reference relation. Expand