The hardness and approximation algorithms for l-diversity

@inproceedings{Xiao2010TheHA,
  title={The hardness and approximation algorithms for l-diversity},
  author={Xiaokui Xiao and Ke Yi and Yufei Tao},
  booktitle={EDBT '10},
  year={2010}
}
The existing solutions to privacy preserving publication can be classified into the theoretical and heuristic categories. The former guarantees provably low information loss, whereas the latter incurs gigantic loss in the worst case, but is shown empirically to perform well on many real inputs. While numerous heuristic algorithms have been developed to satisfy advanced privacy principles such as l-diversity, t-closeness, etc., the theoretical category is currently limited to k-anonymity which… 

Figures and Tables from this paper

On the Complexity of the l-diversity Problem
TLDR
This paper investigates the approximation and parameterized complexity of l-diversity, where the possible attributes are distinguished in sensible attributes and quasi-identifier attributes.
On the Complexity of t-Closeness Anonymization and Related Problems
TLDR
It is proved that for every constant $t$ such that $0\leq t<1$, it is NP-hard to find an optimal $t-closeness generalization of a given table.
Randomized addition of sensitive attributes for l-diversity
  • Y. Sei, A. Ohsuga
  • Computer Science
    2014 11th International Conference on Security and Cryptography (SECRYPT)
  • 2014
TLDR
This paper proposes a new technique for l-diversity, which keeps QIDs unchanged and randomizes sensitive attributes of each individual so that data users can analyze it based on QIDs they focus on and does not require the eligibility requirement.
An Algorithm for l-diversity based on Randomized Addition of Sensitive Values
TLDR
This study proposes a new technique for l-diversity, which keeps QIDs unchanged so that data users can analyze it based on QIDs they focus on, and proves that the proposed method can result in a better tradeoff between privacy and utility of the anonymized database.
A generalization model for multi-record privacy preservation
TLDR
A bidirectional personalized generalization model is proposed as a new solution to satisfy higher privacy requirements and make it suitable for multi-record publishing datasets, and a new hierarchical generalization strategy is proposed for personal privacy preservation of sensitive attributes.
The effect of homogeneity on the computational complexity of combinatorial data anonymization
TLDR
The fixed-parameter tractability result implies that k-Anonymity can be solved in linear time when tin is a constant, and the computational hardness results extend to p-Sensitivity and the usage of domain generalization hierarchies.
An enhanced l-diversity privacy preservation
TLDR
A (k, l, θ)-diversity model base on clustering to minimize the information loss as well as assure data quality and extensive experimental evaluation shows that the techniques clearly outperform the existing approaches in terms of execution time and data utility.
Data Anonymization Based on Natural Equivalent Class
TLDR
This paper proposes a novel clustering-based anonymization algorithm, which tries to cluster records without separating any natural equivalent class, and proves that the natural equivalentclass can effectively reduce the computational complexity of clustering algorithms as well as information loss.
(l1, ..., lq)-diversity for Anonymizing Sensitive Quasi-Identifiers
TLDR
This paper proposes a novel privacy definition (l1, ..., lq)-diversity and a method that can treat sensitive QIDs, which is composed of an anonymization algorithm and a reconstruction algorithm.
...
...

References

SHOWING 1-10 OF 69 REFERENCES
On optimal anonymization for l+-diversity
  • Junqiang Liu, Ke Wang
  • Computer Science
    2010 IEEE 26th International Conference on Data Engineering (ICDE 2010)
  • 2010
TLDR
A pruning based algorithm for finding an optimal solution to an extended form of the l-diversity problem that can be instantiated with any reasonable cost metric and improves the data utility.
On the complexity of optimal K-anonymity
TLDR
It is proved that two general versions of optimal k-anonymization of relations are NP-hard, including the suppression version which amounts to choosing a minimum number of entries to delete from the relation.
Approximate algorithms for K-anonymity
TLDR
This paper proposes several approximation algorithms that guarantee O(log k)-approximation ratio and perform significantly better than the traditional algorithms and also provides O(ß log k-approximate algorithms which gracefully adjust their running time according to the tolerance é (≥ 1) of the approximation ratios.
Fast Data Anonymization with Low Information Loss
TLDR
This paper focuses on one-dimensional (i.e., single attribute) quasi-identifiers, and study the properties of optimal solutions for k-anonymity and l-diversity, and develops efficient heuristics to solve the one- dimensional problems in linear time based on meaningful information loss metrics.
On k-Anonymity and the Curse of Dimensionality
TLDR
It is shown that the curse of high dimensionality also applies to the problem of privacy preserving data mining, and when a data set contains a large number of attributes which are open to inference attacks, it becomes difficult to anonymize the data without an unacceptably high amount of information loss.
Aggregate Query Answering on Anonymized Tables
TLDR
A general framework of permutations-based anonymization to support accurate answering of aggregate queries is presented and it is shown that, for the same grouping, permutation-based techniques can always answer aggregate queries more accurately than generalization-based approaches.
t-Closeness: Privacy Beyond k-Anonymity and l-Diversity
The k-anonymity privacy requirement for publishing microdata requires that each equivalence class (i.e., a set of records that are indistinguishable from each other with respect to certain
L-diversity: privacy beyond k-anonymity
TLDR
This paper shows with two simple attacks that a \kappa-anonymized dataset has some subtle, but severe privacy problems, and proposes a novel and powerful privacy definition called \ell-diversity, which is practical and can be implemented efficiently.
Data privacy through optimal k-anonymization
  • R. Bayardo, R. Agrawal
  • Computer Science
    21st International Conference on Data Engineering (ICDE'05)
  • 2005
TLDR
This paper proposes and evaluates an optimization algorithm for the powerful de-identification procedure known as k-anonymization, and presents a new approach to exploring the space of possible anonymizations that tames the combinatorics of the problem, and develops data-management strategies to reduce reliance on expensive operations such as sorting.
Utility-based anonymization using local recoding
TLDR
This paper proposes a simple framework to specify utility of attributes and develops two simple yet efficient heuristic local recoding methods for utility-based anonymization, which outperform the state-of-the-art multidimensional global recode methods in both discernability and query answering accuracy.
...
...