Learn More
Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called <i>k</i>-anonymity has gained popularity. In a <i>k</i>-anonymized dataset, each record is indistinguishable from at least <i>k</i> &minus; 1 other records with respect to certain identifying(More)
Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called k-anonymity has gained popularity. In a k-anonymized dataset, each record is indistinguishable from at least k − 1 other records with respect to certain " identifying " attributes. In this paper we(More)
In this paper, we propose the first formal privacy analysis of a data anonymization process known as the synthetic data generation, a technique becoming popular in the statistics community. The target application for this work is a mapping program that shows the commuting patterns of the population of the United States. The source data for this application(More)
Differential privacy is a powerful tool for providing privacy-preserving noisy query answers over statistical databases. It guarantees that the distribution of noisy query answers changes very little with the addition or deletion of any tuple. It is frequently accompanied by popularized claims that it provides privacy without any assumptions about the data(More)
Constraint-based mining of itemsets for questions such as "find all frequent itemsets where the total price is at least $50" has received much attention recently. Two classes of constraints, monotone and antimonotone, have been identified as very useful. There are algorithms that efficiently take advantage of either one of these two classes, but no previous(More)
We consider differentially private algorithms for convex empirical risk minimization (ERM). Differential privacy (Dwork et al., 2006b) is a recently introduced notion of privacy which guarantees that an algorithm's output does not depend on the data of any individual in the dataset. This is crucial in fields that handle sensitive data, such as genomics,(More)
While a lot of research has focused on proving formal privacy guarantees for anonymizing data it While a lot of research has focused on proving formal privacy guarantees for anonymizing data, it is poorly understood whether useful real life data can be published with existing provably private is poorly understood whether useful real life data can be(More)
Recent work has shown the necessity of considering an attacker's background knowledge when reasoning about privacy in data publishing. However, in practice, the data publisher does not know what background knowledge the attacker possesses. Thus, it is important to consider the worst-case. In this paper, we initiate a formal study of worst-case background(More)