Differential Privacy and Census Data: Implications for Social and Economic Research

@article{Ruggles2019DifferentialPA,
  title={Differential Privacy and Census Data: Implications for Social and Economic Research},
  author={Steven Ruggles and Catherine A. Fitch and Diana L. Magnuson and Jonathan P. Schroeder},
  journal={AEA Papers and Proceedings},
  year={2019}
}
The Census Bureau has announced new methods for disclosure control in public use data products. The new approach, known as differential privacy, represents a radical departure from current practice. In its pure form, differential privacy techniques may make the release of useful microdata impossible and limit the utility of tabular small-area data. Adoption of differential privacy will have far-reaching consequences for research. It is likely that scientists, planners, and the public will lose… 
A firm foundation for statistical disclosure control
TLDR
The theory of data privacy and confidentiality in statistics and computer science is reviewed, to modernize the theory of anonymization, which results in the mathematical definitions of identity disclosure and attribute disclosure applicable to even synthetic data.
Differential privacy in the 2020 US census: what will it do? Quantifying the accuracy/privacy tradeoff
TLDR
An empirical measure of privacy loss is developed to compare the error and privacy of the new approach to that of a (non-differentially private) simple-random-sampling approach to protecting privacy and it is found that the empirical privacy loss of TopDown is substantially smaller than the theoretical guarantee for all privacy loss budgets.
Differential privacy in the 2020 US census: what will it do? Quantifying the accuracy/privacy tradeoff.
TLDR
An empirical measure of privacy loss is developed to compare the error and privacy of the new approach to that of a simple-random-sampling approach to protecting privacy and it is found that the empirical privacy loss of TopDown is substantially smaller than the theoretical guarantee for all privacy loss budgets.
From Algorithmic to Institutional Logics: The Politics of Differential Privacy
TLDR
This paper investigates the political dimensions of differential privacy, describing the entanglements between algorithmic privacy and institutional logics and highlighting disempowering practices that may emerge despite, or in response to, the adoption of differentially private methods.
Balancing data privacy and usability in the federal statistical system.
TLDR
This essay argues that the discussion of federal statistical system change has not given proper consideration to the reduced social benefits of data availability and their usability relative to the value of increased levels of privacy protection, and recommends that a more balanced benefit-cost framework should be used to assess these trade-offs.
The Impact of the U.S. Census Disclosure Avoidance System on Redistricting and Voting Rights Analysis
TLDR
This analysis finds that the DAS-protected data are biased against certain areas, depending on voter turnout and partisan and racial composition, and that these biases lead to large and unpredictable errors in the analysis of partisan andracial gerrymanders.
Assessing Statistical Disclosure Risk for Differentially Private, Hierarchical Count Data, with Application to the 2020 U.S. Decennial Census
We propose Bayesian methods to assess the statistical disclosure risk of data released under zero-concentrated differential privacy, focusing on settings with a strong hierarchical structure and
How differential privacy will affect our understanding of health disparities in the United States
TLDR
It is found that the implementation of differential privacy will produce dramatic changes in population counts for racial/ethnic minorities in small areas and less urban settings, significantly altering knowledge about health disparities in mortality.
Differential Privacy for Government Agencies - Are We There Yet?
TLDR
It is argued that the requirements for implementing differential privacy approaches at government agencies are often fundamentally fundamentally different from the requirements in industry, which raises many challenges and questions that still need to be addressed before the concept can be used as an overarching principle when sharing data with the public.
Suboptimal Provision of Privacy and Statistical Accuracy When They are Public Goods
TLDR
It is proved that provision by the private firm results in inefficiently low data quality in this framework and is model a firm that publishes statistics under a guarantee of differential privacy.
...
...

References

SHOWING 1-10 OF 57 REFERENCES
An Economic Analysis of Privacy Protection and Statistical Accuracy as Social Choices
TLDR
An economic solution is proposed: operate where the marginal cost of increasing privacy equals the marginal benefit, and the model of production, from computer science, assumes data are published using an efficient differentially private algorithm.
Differential Privacy and Federal Data Releases
  • J. Reiter
  • Computer Science, Law
    Annual Review of Statistics and Its Application
  • 2019
TLDR
The article describes potential benefits and limitations of using differential privacy for federal data, reviews current federal data products that satisfy differential privacy, and outlines research needed for adoption of differential privacy to become widespread among federal agencies.
Differential Privacy and Statistical Disclosure Risk Measures: An Investigation with Binary Synthetic Data
TLDR
An alternative disclosure risk assessment approach is presented that integrates some of the strong confidential- ity protection features in ϵ-differential privacy with the interpretability and data-specific nature of probabilistic disclosure risk measures.
Fool's Gold: an Illustrated Critique of Differential Privacy
TLDR
Policymakers and data stewards will have to rely on a mix of approaches: perhaps differential privacy where it is well-suited to the task, and other disclosure prevention techniques in the great majority of situations where it isn’t.
Issues Encountered Deploying Differential Privacy
TLDR
The U.S. Census Bureau has encountered many challenges in attempting to transition differential privacy from the academy to practice, including obtaining qualified personnel and a suitable computing environment, the difficulty accounting for all uses of the confidential data, and the lack of release mechanisms that align with the needs of data users.
The U.S. Census Bureau Adopts Differential Privacy
TLDR
This work designed a differentially private publication system that directly addressed vulnerabilities that were exposed by the Dinur and Nissim (2003) database reconstruction theorem while preserving the fitness for use of the core statistical products.
When Excessive Perturbation Goes Wrong and Why IPUMS-International Relies Instead on Sampling, Suppression, Swapping, and Other Minimally Harmful Methods to Protect Privacy of Census Microdata
TLDR
A recent case of perturbation gone wrong- the household samples of the 2000 census of the USA (PUMS), the 2003-2006 American Community Survey, and the 2004-2009 Current Population Survey-, and a mathematical demonstration in a timely compendium of statistical confidentiality practices confirm the wisdom of IPUMS microdata management protocols and statistical disclosure controls.
Challenges to the confidentiality of U.S. Federal statistics, 1910-1965
The article uses a new approach to the analysis of statistical confidentiality in official statistics by reframing the discussion of challenges to statistical confidentiality from the hypothetical
...
...