SafePub: A Truthful Data Anonymization Algorithm With Strong Privacy Guarantees

@article{Bild2018SafePubAT,
  title={SafePub: A Truthful Data Anonymization Algorithm With Strong Privacy Guarantees},
  author={Raffael Bild and Klaus A. Kuhn and Fabian Prasser},
  journal={Proceedings on Privacy Enhancing Technologies},
  year={2018},
  volume={2018},
  pages={67 - 87}
}
Abstract Methods for privacy-preserving data publishing and analysis trade off privacy risks for individuals against the quality of output data. In this article, we present a data publishing algorithm that satisfies the differential privacy model. The transformations performed are truthful, which means that the algorithm does not perturb input data or generate synthetic output data. Instead, records are randomly drawn from the input dataset and the uniqueness of their features is reduced. This… 
Differentially private release of medical microdata: an efficient and practical approach for preserving informative attribute values
TLDR
It is proved that the results of data analyses using the original dataset and those obtained using a dataset anonymized via the proposed method are considerably similar.
Privacy-Preserving Sharing of Health Data using Hybrid Anonymisation Techniques-A Comparison
TLDR
This project compared two recently proposed hybrid anonymisation algorithms, MDP and SafePub, to study their applicability on medical datasets and found which algorithm is best suitable, based on the dataset characteristics, the required privacy level and the acceptable information loss.
PrivGuard: Sensitivity Guided Anonymization based PPDM with Automatic Selection of Sensitive Attributes
  • V. RaniM. S. Rao
  • Computer Science
    2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS)
  • 2021
TLDR
PrivGuard: Sensitivity Guided Anonymization based PPDM with Automatic Selection of Sensitive Attributes finds sensitive attributes in the database by finding the Sensitivity Rank for each attribute by calculating attribute evaluation measures.
Development and Analysis of Deterministic Privacy-Preserving Policies Using Non- Stochastic Information Theory
  • Farhad Farokhi
  • Computer Science
    IEEE Transactions on Information Forensics and Security
  • 2019
TLDR
The measure of privacy is used to analyze $k$ -anonymity (a popular deterministic mechanism for privacy-preserving release of datasets using suppression and generalization techniques), proving that it is in fact not privacy preserving.
A Multi-view Approach to Preserve Privacy and Utility in Network Trace Anonymization
TLDR
The key idea is for the analysts to generate and analyze multiple anonymized views of the original network traces that are designed to be sufficiently indistinguishable even to adversaries armed with prior knowledge, which preserves the privacy.
Privacy as a Service: Anonymisation of NetFlow Traces
TLDR
The main purpose of this paper is to provide a definition of an original data anonymisation paradigm in order to render the re-identification of related users impossible and to empirically evaluate the performance and data partition of the proposed solution.
Noiseless Privacy
TLDR
It is proved that quantization operators can ensure noiseless privacy if the number of quantization levels is appropriately selected based on the sensitivity of the query and the privacy budget, and that the maximal information leakage is bounded by the Privacy Budget.
Flexible data anonymization using ARX—Current status and challenges ahead
TLDR
This work describes how an open source data anonymization tool is extended to support almost arbitrary combinations of a wide range of techniques in a scalable manner and results of an extensive experimental comparison show that this approach outperforms related solutions in terms of scalability and output data quality.
An Evaluation of Anonymized Models and Ensemble Classifiers
TLDR
Evaluated privacy models and ensemble classification algorithms for data anonymization on classification show that there is no significant difference between the accuracy of classification using original data and the accuracies using anonymized data.
Anonymiza on of directory-structured sensi ve data Anonymisering av katalogstrukturerad känslig data
TLDR
It was concluded that the differential privacy model when using the RecursiveDirectoryWise approach, was the most suitable combination to use when trying to follow the GDPR when anonymizing directory-structured data.
...
...

References

SHOWING 1-10 OF 55 REFERENCES
A Supermodularity-Based Differential Privacy Preserving Algorithm for Data Anonymization
TLDR
This paper proposes a scalable algorithm that meets differential privacy when applying a specific random sampling and proves that it can be implemented in polynomial time and shows that combining the proposed aggregate formulation with specific sampling gives an anonymization algorithm that satisfies differential privacy.
Differentially private data release for data mining
TLDR
This paper proposes the first anonymization algorithm for the non-interactive setting based on the generalization technique, which first probabilistically generalizes the raw data and then adds noise to guarantee ∈-differential privacy.
Provably Private Data Anonymization: Or, k-Anonymity Meets Differential Privacy
TLDR
This paper is the first to link k-anonymity with differential privacy and illustrates that “hiding in a crowd ofk” indeed offers privacy guarantees and shows that adding a random-sampling step can greatly amplify the privacy guarantee provided by a differentially-private lgorithm.
Empirical privacy and empirical utility of anonymized data
TLDR
This paper reverse the idea of a `privacy attack,' by incorporating it into a measure of privacy, and advocates the notion of empirical privacy, based on the posterior beliefs of an adversary, and their ability to draw inferences about sensitive values in the data.
Data mining with differential privacy
TLDR
This paper addresses the problem of data mining with formal privacy guarantees, given a data access interface based on the differential privacy framework by considering the privacy and the algorithmic requirements simultaneously, focusing on decision tree induction as a sample application.
Data privacy through optimal k-anonymization
  • R. BayardoR. Agrawal
  • Computer Science
    21st International Conference on Data Engineering (ICDE'05)
  • 2005
TLDR
This paper proposes and evaluates an optimization algorithm for the powerful de-identification procedure known as k-anonymization, and presents a new approach to exploring the space of possible anonymizations that tames the combinatorics of the problem, and develops data-management strategies to reduce reliance on expensive operations such as sorting.
A Practical Framework for Privacy-Preserving Data Analytics
TLDR
This paper proposes a practical framework for data analytics, while providing differential privacy guarantees to individual data contributors, and presents two methods with different sampling techniques to draw a subset of individual data for analysis.
Introduction to Privacy-Preserving Data Publishing: Concepts and Techniques
TLDR
This book not only explores privacy and information utility issues but also efficiency and scalability challenges and highlights efficient and scalable methods and provides an analytical discussion to compare the strengths and weaknesses of different solutions.
On the tradeoff between privacy and utility in data publishing
TLDR
The fundamental characteristics of privacy and utility are analyzed, and it is shown that it is inappropriate to directly compare privacy with utility, and an integrated framework for considering privacy-utility tradeoff is proposed, borrowing concepts from the Modern Portfolio Theory for financial investment.
The cost of privacy: destruction of data-mining utility in anonymized data publishing
TLDR
The results demonstrate that even modest privacy gains require almost complete destruction of the data-mining utility, and suggest that in most cases, trivial sanitization provides equivalent utility and better privacy than k-anonymity, l-diversity, and similar methods based on generalization and suppression.
...
...