Releasing survey microdata with exact cluster locations and additional privacy safeguards

  title={Releasing survey microdata with exact cluster locations and additional privacy safeguards},
  author={Till Koebe and Alejandra Arias-Salazar},
Household survey programs around the world publish fine-granular georeferenced microdata to support research on the interdependence of human livelihoods and their sur-rounding environment. To safeguard the respondents’ privacy, micro-level survey data is usually (pseudo)-anonymized through deletion or perturbation procedures such as obfus-cating the true location of data collection. This, however, poses a challenge to emerging approaches that augment survey data with auxiliary information on a… 

Figures and Tables from this paper


The Global Multidimensional Poverty Index (MPI) 2019
  • OPHI MPI Methodological Note 47. Tech. Rep., Oxford Poverty and Human Development Initiative,
  • 2019
Fonctions de repartition a n dimensions et leurs marges
Machine learning and phone data can improve targeting of humanitarian aid
The COVID-19 pandemic has devastated many low- and middle-income countries, causing widespread food insecurity and a sharp decline in living standards1. In response to this crisis, governments and
A Review of Tabular Data Synthesis Using GANs on an IDS Dataset
Focusing on tabular data generation, CTGAN, CopulaGAN, and TableGAN models are used for the creation of synthetic IDS data and are trained and evaluated on an NSL-KDD dataset, considering the limitations and requirements that this procedure needs.
Implicit Generative Copulas
This paper proposes a flexible, yet conceptually simple alternative based on implicit generative neural networks that can obtain samples from the high-dimensional copula distribution without relying on parametric assumptions or the need to find a suitable tree structure.
Introduction to Word Embeddings
NLP tasks such as document classification, sentiment analysis, clustering, and document summarization require processing and understanding of textual data, but one way of doing this is to convert textual representation to a numerical form using some statistical methods, which do not consider the meaning of a sentence.
Working toward effective anonymization for surveillance data: innovation at South Africa’s Agincourt Health and Socio-Demographic Surveillance Site
Linking people and places is essential for population-health-environment research. Yet, this data integration requires geographic coding such that information reflecting individuals or households can
MTCopula: Synthetic Complex Data Generation Using Copula
MTCopula is a flexible and extendable solution that automatically chooses the best Copula model, between Gaussian Copula and T-Copula models, and the best-fitted marginals to catch the data complexity, and relies on Maximum Likelihood Estimation to fit the possible marginal distribution models.
Intercensal updating using structure-preserving methods and satellite imagery
Funding information The authors did not receive any specific funding for this work. Censuses are fundamental building blocks of most modernday societies, yet collected every ten years at best. We
Copula Flows for Synthetic Data Generation
This paper proposes to use a probabilistic model as a synthetic data generator and uses normalising flows to learn both the copula density and univariate marginals, and benchmarks the method on both simulated and real data-sets in terms of density estimation.