• Corpus ID: 239768194

A Feasibility Study of Differentially Private Summary Statistics and Regression Analyses for Administrative Tax Data

@inproceedings{Barrientos2021AFS,
  title={A Feasibility Study of Differentially Private Summary Statistics and Regression Analyses for Administrative Tax Data},
  author={Andr{\'e}s F. Barrientos and Aaron R. Williams and Joshua Snoke and Claire McKay Bowen},
  year={2021}
}
Federal administrative tax data are invaluable for research, but because of privacy concerns, access to these data is typically limited to select agencies and a few individuals. An alternative to sharing microlevel data are validation servers, which allow individuals to query statistics without accessing the confidential data. This paper studies the feasibility of using differentially private (DP) methods to implement such a server. We provide an extensive study on existing DP methods for… 
1 Citations

References

SHOWING 1-10 OF 68 REFERENCES

Differentially Private Significance Tests for Regression Coefficients

TLDR
Algorithms for assessing whether regression coefficients of interest are statistically significant or not are presented and conditions under which the algorithms should give accurate answers about statistical significance are described.

Providing access to confidential research data through synthesis and verification: An application to data on employees of the U.S. federal government

TLDR
This work presents an application of the synthetic data plus verification server approach to longitudinal data on employees of the U.S. federal government, and presents a novel model for generating synthetic career trajectories, as well as strategies for generating high dimensional, longitudinal synthetic datasets.

Differentially Private Regression Diagnostics

TLDR
ε-differentially private diagnostics for regression are developed, beginning to fill a gap in privacy-preserving data analysis and are adequate for diagnosing the fit and predictive power of regression models on representative datasets when the size of the dataset times the privacy parameter (ε) is at least 1000.

Comparative Study of Differentially Private Data Synthesis Methods

TLDR
Current DIfferentially Private Data Synthesis (DIPS) techniques for releasing individual-level surrogate data for the original data are examined, compare the techniques conceptually, and evaluate the statistical utility and inferential properties of the synthetic data via each DIPS technique through extensive simulation studies.

Differentially private data release via statistical election to partition sequentially

TLDR
A new DIPS approach, STatistical Election to Partition Sequentially (STEPS), is proposed that partitions data by attributes according to their importance ranks per either a practical importance or statistical importance measure and develops a general-utility metric to assess the similarity of the synthetic data to the actual data.

Confidentiality and Differential Privacy in the Dissemination of Frequency Tables

TLDR
This paper studies confidentiality protection for perturbed frequency tables, including the trade-off with analytical utility, focusing on a version of the ABS TableBuilder as a concrete example of a data release mechanism, and examining its properties.

Differentially private model selection with penalized and constrained likelihood

TLDR
This work shows that model selection procedures based on penalized least squares or likelihood can be made differentially private by a combination of regularization and randomization, and proposes two algorithms to do so.

Differentially Private Simple Linear Regression

TLDR
A thorough experimental evaluation of differentially private algorithms for simple linear regression on small datasets with tens to hundreds of records is performed, finding that algorithms based on robust estimators—in particular, the median-based estimator of Theil and Sen—perform best on small dataset, while algorithmsbased on Ordinary Least Squares or Gradient Descent perform better for large datasets.

Privacy-preserving statistical estimation with optimal convergence rates

TLDR
It is shown that for a large class of statistical estimators T and input distributions P, there is a differentially private estimator AT with the same asymptotic distribution as T, which implies that AT (X) is essentially as good as the original statistic T(X) for statistical inference, for sufficiently large samples.

General-Purpose Differentially-Private Confidence Intervals

TLDR
This work develops two broadly applicable methods for private confidence-interval construction based on asymptotics and the parametric bootstrap, which applies "out of the box" to a wide class of private estimators and has good coverage at small sample sizes, but with increased computational cost.
...