Learn More
We study clustering under the data stream model of computation where: given a sequence of points, the objective is to maintain a consistently good clustering of the sequence observed so far, using a small amount of memory and time. The data stream model is relevant to new classes of applications involving massive data sets, such as web click stream analysis(More)
The data stream model has recently attracted attention for its applicability to numerous types of data, including telephone records, web documents and clickstreams. For analysis of such data, the ability to process the data in a single pass, or a small number of passes, while using little memory, is crucial. We describe such a streaming algorithm that(More)
Streaming data analysis has recently attracted attention in numerous applications including telephone records, web documents and clickstreams. For such analysis, single-pass algorithms that consume a small amount of memory are critical. We describe such a streaming algorithm that e ectively clusters large data streams. We also provide empirical evidence of(More)
The question of how to publish an anonymized search log was brought to the forefront by a well-intentioned, but privacy-unaware AOL search log release. Since then a series of ad-hoc techniques have been proposed in the literature, though none are known to be provably private. In this paper, we take a major step towards a solution: we show how queries,(More)
Given a data set consisting of private information about individuals, we consider the <i>online query auditing problem:</i> given a sequence of queries that have already been posed about the data, their corresponding answers -- where each answer is either the true answer or "denied" (in the event that revealing the answer compromises privacy) -- and given a(More)
Many organizations such as the U.S. Census publicly release samples of data that they collect about private citizens. These datasets are first anonymized using various techniques and then a small sample is released so as to enable “do-it-yourself” calculations. This paper investigates the privacy of the second step of this process: sampling. We observe that(More)
In recent years, there has been an abundance of rich and fine-grained data about individuals in domains such as healthcare, finance, retail, web search, and social networks. It is desirable for data collectors to enable third parties to perform complex data mining applications over such data. However, privacy is a natural obstacle that arises when sharing(More)
Social networks are ubiquitous. The discovery of close-knit clusters in these networks is of fundamental and practical interest. Existing clustering criteria are limited in that clusters typically do not overlap, all vertices are clustered and/or external sparsity is ignored. We introduce a new criterion that overcomes these limitations by combining(More)