Raghu Ramakrishnan

Learn More
While the use of MapReduce systems (such as Hadoop) for large scale data analysis has been widely recognized and studied, we have recently seen an explosion in the number of systems developed for cloud data serving. These newer systems address "cloud OLTP" applications, though they typically do not support ACID transactions. Examples of systems proposed for(More)
Finding useful patterns in large datasets has attracted considerable interest recently, and one of the most widely studied problems in this area is the identification of <i>clusters,</i> or densely populated regions, in a multi-dimensional dataset. Prior work does not adequately address the problem of large datasets and minimization of I/O costs.This paper(More)
A number of organizations publish microdata for purposes such as public health and demographic research. Although attributes that clearly identify individuals, such as Name and Social Security Number, are generally removed, these databases can sometimes be joined with other public databases on attributes such as Zipcode, Sex, and Birthdate to re-identify(More)
K-Anonymity has been proposed as a mechanism for protecting privacy in microdata publishing, and numerous recoding "models" have been considered for achieving &#55349;&#56408;anonymity. This paper proposes a new multidimensional model, which provides an additional degree of flexibility not seen in previous (single-dimensional) approaches. Often this(More)
We describe PNUTS, a massively parallel and geographically distributed database system for Yahoo!’s web applications. PNUTS provides data storage organized as hashed or ordered tables, low latency for large numbers of concurrent requests including updates and queries, and novel per-record consistency guarantees. It is a hosted, centrally managed, and(More)
We explore the eeect of dimensionality on the \nearest neighbor" problem. We show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance to the farthest data point. To provide a practical perspective, we present(More)
We introduce the Iceberg-CUBE problem as a reformulation of the datacube (CUBE) problem. The Iceberg-CUBE problem is to compute only those group-by partitions with an aggregate value (e.g., count) above some minimum support threshold. The result of Iceberg-CUBE can be used (1) to answer group-by queries with a clause such as HAVING COUNT(*) &gt;= X, where X(More)
This paper considers the efficient evaluation of recursive queries expressed using Horn Clauses. We define <italic>sideways information passing</italic> formally and show how a query evaluation algorithm may be defined in terms of sideways information passing and control. We then consider a class of information passing strategies which suffices to describe(More)