• Corpus ID: 207912185

Statistically Valid Inferences from Privacy Protected Data

@inproceedings{Evans2019StatisticallyVI,
  title={Statistically Valid Inferences from Privacy Protected Data},
  author={Georgina Evans and Gary King and Margaret Schwenzfeier and Abhradeep Thakurta},
  year={2019}
}
Unprecedented quantities of data that could help social scientists understand and ameliorate the challenges of human society are presently locked away inside companies, governments, and other organizations, in part because of privacy concerns. We address this problem with a general-purpose data access and analysis system with mathematical guarantees of privacy for research subjects, and statistical validity guarantees for researchers seeking social science insights. We build on the standard of… 

Figures from this paper

Differentially Private Survey Research
Survey researchers have long protected the privacy of respondents via de-identification (removing names and other directly identifying information) before sharing data. Although these procedures
Statistically Valid Inferences from Differentially Private Data Releases, with Application to the Facebook URLs Dataset
TLDR
Methods developed to correct for naturally occurring measurement error are adapted, with special attention to computational efficiency for large datasets, and the result is statistically valid linear regression estimates and descriptive statistics that can be interpreted as ordinary analyses of nonconfidential data but with appropriately larger standard errors.
Visualizing Privacy-Utility Trade-Offs in Differentially Private Data Releases
TLDR
Visualizing Privacy (ViP) is presented, an interactive interface that visualizes relationships between ɛ, accuracy, and disclosure risk to support setting and splitting ɚ among queries and has an inference setting, allowing a user to reason about the impact of DP noise on statistical inferences.
Non-parametric Differentially Private Confidence Intervals for the Median
TLDR
This paper proposes and evaluates several strategies to compute valid differentially private confidence intervals for the median and illustrates that addressing both sources of uncertainty–the error from sampling and the error from protecting the output–simultaneously should be preferred over simpler approaches that incorporate the uncertainty in a sequential fashion.
Really Useful Synthetic Data - A Framework to Evaluate the Quality of Differentially Private Synthetic Data
TLDR
A framework to evaluate the quality of differentially private synthetic data from an applied researcher's perspective and invites the academic community to jointly advance the privacy-quality frontier.
Unbiased Statistical Estimation and Valid Confidence Intervals Under Differential Privacy
TLDR
If the user can bound the parameters of the BLB-induced parameters and provide heavier-tailed families, the algorithm produces unbiased parameter estimates and valid confidence intervals which hold with arbitrarily high probability.
Parametric Bootstrap for Differentially Private Confidence Intervals
TLDR
It is proved that the parametric bootstrap gives consistent confidence intervals in two broadly relevant settings, including a novel adaptation to linear regression that avoids accessing the covariate data multiple times.
Design of a Privacy-Preserving Data Platform for Collaboration Against Human Trafficking
TLDR
This work presents new methods to anonymize, publish, and explore case records on victims of human trafficking as a pipeline generating three artifacts, aimed at transforming how the world's largest database of identified victims is made available for global collaboration against human trafficking.
Privacy-Preserving Randomized Controlled Trials: A Protocol for Industry Scale Deployment
TLDR
This paper outlines a way to deploy an end-to-end privacy-preserving protocol for learning causal effects from Randomized Controlled Trials, particularly focused on the difficult and important case where one party determines which treatment an individual receives, and another party measures outcomes on individuals, and these parties do not want to leak any of their information to each other.
Privacy Preserving Inference on the Ratio of Two Gaussians Using (Weighted) Sums
TLDR
The delta method is used to derive the asymptotic distribution of the ratio estimator and the Gaussian mechanism to provide (ǫ, δ) privacy guarantees and it is shown that the CIs of the methods have the right coverage with proper privacy budget.
...
...

References

SHOWING 1-10 OF 62 REFERENCES
Differentially Private Significance Tests for Regression Coefficients
TLDR
Algorithms for assessing whether regression coefficients of interest are statistically significant or not are presented and conditions under which the algorithms should give accurate answers about statistical significance are described.
The Fienberg Problem: How to Allow Human Interactive Data Analysis in the Age of Differential Privacy
TLDR
The (overly) simple problem of allowing a trusted analyst to choose an ``"interesting" statistic for popular release" (the actual computation of the chosen statistic will be carried out in a differentially private way) is discussed.
Differential Privacy: A Primer for a Non-Technical Audience
TLDR
This primer aims to provide a foundation that can guide future decisions when analyzing and sharing statistical data about individuals, informing individuals about the privacy protection they will be afforded, and designing policies and regulations for robust privacy protection.
GUPT: privacy preserving data analysis made easy
TLDR
The design and evaluation of a new system, GUPT, that guarantees differential privacy to programs not developed with privacy in mind, makes no trust assumptions about the analysis program, and is secure to all known classes of side-channel attacks.
The reusable holdout: Preserving validity in adaptive data analysis
TLDR
A new approach for addressing the challenges of adaptivity based on insights from privacy-preserving data analysis is demonstrated, and how to safely reuse a holdout data set many times to validate the results of adaptively chosen analyses is shown.
Issues Encountered Deploying Differential Privacy
TLDR
The U.S. Census Bureau has encountered many challenges in attempting to transition differential privacy from the academy to practice, including obtaining qualified personnel and a suitable computing environment, the difficulty accounting for all uses of the confidential data, and the lack of release mechanisms that align with the needs of data users.
Smooth sensitivity and sampling in private data analysis
TLDR
This is the first formal analysis of the effect of instance-based noise in the context of data privacy, and shows how to do this efficiently for several different functions, including the median and the cost of the minimum spanning tree.
Privacy-preserving statistical estimation with optimal convergence rates
TLDR
It is shown that for a large class of statistical estimators T and input distributions P, there is a differentially private estimator AT with the same asymptotic distribution as T, which implies that AT (X) is essentially as good as the original statistic T(X) for statistical inference, for sufficiently large samples.
Calibrating Noise to Sensitivity in Private Data Analysis
TLDR
The study is extended to general functions f, proving that privacy can be preserved by calibrating the standard deviation of the noise according to the sensitivity of the function f, which is the amount that any single argument to f can change its output.
...
...