• Publications
  • Influence
BIRCH: an efficient data clustering method for very large databases
tl;dr
This paper presents a data clustering method named BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies), and demonstrates that it is especially suitable for very large databases. Expand
  • 4,426
  • 396
Mondrian Multidimensional K-Anonymity
tl;dr
We introduce a new multidimensional recoding model for k-anonymization, which provides an additional degree of flexibility not seen in previous (single-dimensional) approaches. Expand
  • 1,067
  • 167
Incognito: efficient full-domain K-anonymity
tl;dr
A number of organizations publish microdata for purposes such as public health and demographic research. Expand
  • 1,146
  • 131
PNUTS: Yahoo!'s hosted data serving platform
tl;dr
We describe PNUTS, a massively parallel and geographically distributed database system for Yahoo!'s web applications, and present experimental results. Expand
  • 1,066
  • 110
Bottom-up computation of sparse and Iceberg CUBE
tl;dr
We introduce the Iceberg- CUBE problem as a reformulation of the datacube (CUBE) problem. Expand
  • 534
  • 93
Database Management Systems
From the Publisher: Database Management Systems (DBMS) is a must for any course in database systems or file organization. DBMS provides a hands-on approach to relational database systems, with anExpand
  • 1,783
  • 78
BIRCH: A New Data Clustering Algorithm and Its Applications
tl;dr
In this paper, an efficient and scalable data clustering method is proposed, based on a new in-memory data structure called CF-tree, which serves as an in-Memory summary of the data distribution. Expand
  • 602
  • 42
CACTUS—clustering categorical data using summaries
tl;dr
CACTUS is a fast summarization-based algorithm that exploits the small domain sizes of categorical attributes for clustering categorical data. Expand
  • 557
  • 36
On the power of magic
tl;dr
This paper considers the efficient evaluation of recursive queries expressed using Horn Clauses by rewriting a program and evaluating the rewritten program bottom-up. Expand
  • 482
  • 36
Big data and its technical challenges
tl;dr
Exploring the inherent technical challenges in realizing the potential of Big Data. Expand
  • 628
  • 33