Learn More
We study clustering under the data stream model of computation where: given a sequence of points, the objective is to maintain a consistently good clustering of the sequence observed so far, using a small amount of memory and time. The data stream model is relevant to new classes of applications involving massive data sets, such as web click stream analysis(More)
The data stream model has recently attracted attention for its applicability to numerous types of data, including telephone records, web documents and clickstreams. For analysis of such data, the ability to process the data in a single pass, or a small number of passes, while using little memory, is crucial. We describe such a streaming algorithm that(More)
As data gathering grows easier, and as researchers discover new ways to interpret data, streaming-data algorithms have become essential in many elds. Data stream computation precludes algorithms that require random access or large memory. In this paper, we consider the problem of clustering data streams, which is important in the analysis a variety of(More)
Given a data set consisting of private information about individuals, we consider the <i>online query auditing problem:</i> given a sequence of queries that have already been posed about the data, their corresponding answers -- where each answer is either the true answer or "denied" (in the event that revealing the answer compromises privacy) -- and given a(More)
The question of how to publish an anonymized search log was brought to the forefront by a well-intentioned, but privacy-unaware AOL search log release. Since then a series of ad-hoc techniques have been proposed in the literature, though none are known to be provably private. In this paper, we take a major step towards a solution: we show how queries,(More)
We consider the online query auditing problem for statistical databases. Given a stream of aggregate queries posed over sensitive data, when should queries be denied in order to protect the privacy of individuals? We construct efficient auditors for max queries and bags of max and min queries in both the partial and full disclosure settings. Our algorithm(More)
Social networks are ubiquitous. The discovery of close-knit clusters in these networks is of fundamental and practical interest. Existing clustering criteria are limited in that clusters typically do not overlap, all vertices are clustered and/or external sparsity is ignored. We introduce a new criterion that overcomes these limitations by combining(More)
Given two or more parties possessing large, confidential datasets, we consider the problem of securely computing the k th-ranked element of the union of the datasets, e.g. the median of the values in the datasets. We investigate protocols with sublinear computation and communication costs. In the two-party case, we show that the k th-ranked element can be(More)
We consider exact learning monotone CNF formulas in which each variable appears at most some constant k times (" read-k " monotone CNF). Let f : {0, 1} n → {0, 1} be expressible as a read-k monotone CNF formula for some natural number k. We give an incremental output polynomial time algorithm for exact learning both the read-k CNF and (not necessarily read(More)