Learn More
Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called <i>k</i>-anonymity has gained popularity. In a <i>k</i>-anonymized dataset, each record is indistinguishable from at least <i>k</i> &minus; 1 other records with respect to certain identifying(More)
Detecting changes in a data stream is an important area of research with many applications. In this paper, we present a novel method for the detection and estimation of change. In addition to providing statistical guarantees on the reliability of detected changes, our method also provides meaningful descriptions and quantification of these changes. Our(More)
Recent work has shown the necessity of considering an attacker's background knowledge when reasoning about privacy in data publishing. However, in practice, the data publisher does not know what background knowledge the attacker possesses. Thus, it is important to consider the worst-case. In this paper, we initiate a formal study of worst-case background(More)
Differential privacy is a powerful tool for providing privacy-preserving noisy query answers over statistical databases. It guarantees that the distribution of noisy query answers changes very little with the addition or deletion of any tuple. It is frequently accompanied by popularized claims that it provides privacy without any assumptions about the data(More)
Limiting disclosure in data publishing requires a careful balance between privacy and utility. Information about individuals must not be revealed, but a dataset should still be useful for studying the characteristics of a population. Privacy requirements such as <i>k</i>-anonymity and <i>l</i>-diversity are designed to thwart attacks that attempt to(More)
In this paper, we propose the first formal privacy analysis of a data anonymization process known as the synthetic data generation, a technique becoming popular in the statistics community. The target application for this work is a mapping program that shows the commuting patterns of the population of the United States. The source data for this application(More)
Constraint-based mining of itemsets for questions such as "find all frequent itemsets where the total price is at least $50" has received much attention recently. Two classes of constraints, monotone and antimonotone, have been identified as very useful. There are algorithms that efficiently take advantage of either one of these two classes, but no previous(More)
When you write papers, how many times do you want to make some citations at a place but you are not sure which papers to cite? Do you wish to have a recommendation system which can recommend a small number of good candidates for every place that you want to make some citations? In this paper, we present our initiative of building a context-aware citation(More)
In this paper we present a method for reasoning about privacy using the concepts of exchangeability and deFinetti's theorem. We illustrate the usefulness of this technique by using it to attack a popular data sanitization scheme known as Anatomy. We stress that Anatomy is not the only sanitization scheme that is vulnerable to this attack. In fact, any(More)