Learn More
We study clustering under the data stream model of computation where: given a sequence of points, the objective is to maintain a consistently good clustering of the sequence observed so far, using a small amount of memory and time. The data stream model is relevant to new classes of applications involving massive data sets, such as web click stream analysis(More)
The data stream model has recently attracted attention for its applicability to numerous types of data, including telephone records, web documents and clickstreams. For analysis of such data, the ability to process the data in a single pass, or a small number of passes, while using little memory, is crucial. We describe such a streaming algorithm that(More)
The question of how to publish an anonymized search log was brought to the forefront by a well-intentioned, but privacy-unaware AOL search log release. Since then a series of ad-hoc techniques have been proposed in the literature, though none are known to be provably private. In this paper, we take a major step towards a solution: we show how queries,(More)
As data gathering grows easier, and as researchers discover new ways to interpret data, streaming-data algorithms have become essential in many elds. Data stream computation precludes algorithms that require random access or large memory. In this paper, we consider the problem of clustering data streams, which is important in the analysis a variety of(More)
Given a data set consisting of private information about individuals, we consider the <i>online query auditing problem:</i> given a sequence of queries that have already been posed about the data, their corresponding answers -- where each answer is either the true answer or "denied" (in the event that revealing the answer compromises privacy) -- and given a(More)
Given two or more parties possessing large, confidential datasets, we consider the problem of securely computing the k th-ranked element of the union of the datasets, e.g. the median of the values in the datasets. We investigate protocols with sublinear computation and communication costs. In the two-party case, we show that the k th-ranked element can be(More)
We consider the online query auditing problem for statistical databases. Given a stream of aggregate queries posed over sensitive data, when should queries be denied in order to protect the privacy of individuals? We construct efficient auditors for max queries and bags of max and min queries in both the partial and full disclosure settings. Our algorithm(More)
We consider exact learning monotone CNF formulas in which each variable appears at most some constant k times (" read-k " monotone CNF). Let f : {0, 1} n → {0, 1} be expressible as a read-k monotone CNF formula for some natural number k. We give an incremental output polynomial time algorithm for exact learning both the read-k CNF and (not necessarily read(More)
P3P [23, 24] is a set of standards that allow corporations to declare their privacy policies. Hippo-cratic Databases [6] have been proposed to implement such policies within a corporation's datas-tore. From an end-user individual's point of view, both of these rest on an uncomfortable philosophy of trusting corporations to protect his/her privacy. Recent(More)