• Corpus ID: 219179388

Identification Risk Evaluation of Continuous Synthesized Variables

@article{Hornby2020IdentificationRE,
  title={Identification Risk Evaluation of Continuous Synthesized Variables},
  author={Ryan Hornby and Jingchen Hu},
  journal={arXiv: Methodology},
  year={2020}
}
We propose a general approach to evaluating identification risk of continuous synthesized variables in partially synthetic data. We introduce the use of a radius $r$ in the construction of identification risk probability of each target record, and illustrate with working examples for one or more continuous synthesized variables. We demonstrate our methods with applications to a data sample from the Consumer Expenditure Surveys (CE), and discuss the impacts on risk and data utility of 1) the… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 27 REFERENCES
Bayesian Estimation of Attribute and Identification Disclosure Risks in Synthetic Data
  • Jingchen Hu
  • Mathematics, Computer Science
    Trans. Data Priv.
  • 2019
TLDR
This paper focuses on the detailed re-construction of some Bayesian methods proposed for estimating disclosure risks in synthetic data, to give the readers a comprehensive view of the Bayesian estimation procedures, and enable synthetic data researchers and producers to use these procedures to evaluate disclosure risks.
General and specific utility measures for synthetic data
TLDR
A previous general measure of data utility, the propensity score mean-squared-error (pMSE), is adapted to the specific case of synthetic data and derive its distribution for the case when the correct synthesis model is used to create the synthetic data.
Estimating Risks of Identification Disclosure in Partially Synthetic Data
TLDR
How to evaluate identification disclosure risks in partially synthetic data is described, accounting for released information from the multiple datasets, the model used to generate synthetic values, and the approach used to select values to synthesize.
Practical Data Synthesis for Large Samples
TLDR
New variance estimates for use with large samples of completely synthesised data that do not require them to be generated from the posterior predictive distribution derived from the observed data and can be used with a single synthetic data set are introduced.
Bayesian Data Synthesis and Disclosure Risk Quantification: An Application to the Consumer Expenditure Surveys
The release of synthetic data generated from a model estimated on the data helps statistical agencies disseminate respondent-level data with high utility and privacy protection. Motivated by the
Likelihood Based Finite Sample Inference for Singly Imputed Synthetic Data Under the Multivariate Normal and Multiple Linear Regression Models
In this paper we develop likelihood-based finite sample inference based on singly imputed partially synthetic data, when the original data follow either a multivariate normal or a multiple linear
Using CART to generate partially synthetic public use microdata
TLDR
This article presents and evaluates the use of classification and regression trees to generate partially synthetic data and potential applications of CART are studied via simulation to generate synthetic data for sensitive variables.
Global Measures of Data Utility for Microdata Masked for Disclosure Limitation
When releasing microdata to the public, data disseminators typically alter the original data to protect the confldentiality of database subjects' identities and sensitive attributes. However, such
Bayesian Pseudo Posterior Mechanism under Differential Privacy
We propose a Bayesian pseudo posterior mechanism to generate record-level synthetic datasets with a Differential privacy (DP) guarantee from any proposed synthesizer model. The pseudo posterior
Statistical Disclosure Limitation in the Presence of Edit Rules
TLDR
A simulation study based on data from the Colombian Annual Manufacturing Survey suggests that variants of microaggregation and partially synthetic data offer the most attractive risk-utility profiles among the SDL strategies.
...
1
2
3
...