Applying data synthesis for longitudinal business data across three countries

@article{Alam2020ApplyingDS,
  title={Applying data synthesis for longitudinal business data across three countries},
  author={M. Jahangir Alam and Benoit Dostie and J{\"o}rg Drechsler and Lars Vilhuber},
  journal={arXiv: Econometrics},
  year={2020}
}
Data on businesses collected by statistical agencies are challenging to protect. Many businesses have unique characteristics, and distributions of employment, sales, and profits are highly skewed. Attackers wishing to conduct identification attacks often have access to much more information than for any individual. As a consequence, most disclosure avoidance mechanisms fail to strike an acceptable balance between usefulness and confidentiality protection. Detailed aggregate statistics by… 

References

SHOWING 1-10 OF 67 REFERENCES
Towards Unrestricted Public Use Business Microdata: The Synthetic Longitudinal Business Database
In most countries, national statistical agencies do not release establishment-level business microdata, because doing so represents too large a risk to establishments\' confidentiality. One approach
Improving the Synthetic Longitudinal Business Database
In most countries, national statistical agencies do not release establishment-level business microdata, because doing so represents too large a risk to establishments’ confidentiality. Agencies
Expanding the Role of Synthetic Data at the U.S. Census Bureau
National Statistical offices (NSOs) create official statistics from data collected from survey respondents, government administrative records and other sources. The raw source data is usually
A new approach for disclosure control in the IAB establishment panel—multiple imputation for a better data access
TLDR
An application of Rubin’s idea to generate synthetic datasets from existing confidential survey data for public release, showing that valid inferences can be obtained using the synthetic datasets in this context, while confidentiality is guaranteed for the survey participants.
New data dissemination approaches in old Europe – synthetic datasets for a German establishment survey
Disseminating microdata to the public that provide a high level of data utility, while at the same time guaranteeing the confidentiality of the survey respondent is a difficult task. Generating
Global Measures of Data Utility for Microdata Masked for Disclosure Limitation
When releasing microdata to the public, data disseminators typically alter the original data to protect the confldentiality of database subjects' identities and sensitive attributes. However, such
Using worker flows in the analysis of establishment turnover: evidence from German administrative data
"Economists have long been interested in the determinants and components of job creation and destruction. In many countries administrative datasets provide an excellent source for detailed analysis
General and specific utility measures for synthetic data
TLDR
A previous general measure of data utility, the propensity score mean-squared-error (pMSE), is adapted to the specific case of synthetic data and derive its distribution for the case when the correct synthesis model is used to create the synthetic data.
Distribution-preserving statistical disclosure limitation
TLDR
This work presents two practical methods of generating synthetic values when the imputer has only limited information about the true data generating process, and one is applicable when the true likelihood is known up to a monotone transformation.
Producer Dynamics: New Evidence from Micro Data: The LEHD Infrastructure Files and the Creation of the Quarterly Workforce Indicators
TLDR
This article describes how the input files are compiled and combined to create the infrastructure files, and describes the multiple imputation methods used to impute in missing data and the statistical matching techniques used to combine and edit data when a direct identifier match requires improvement.
...
1
2
3
4
5
...