Protecting Data through Perturbation Techniques: The Impact on Knowledge Discovery in Databases

  title={Protecting Data through Perturbation Techniques: The Impact on Knowledge Discovery in Databases},
  author={Rick L. Wilson and Peter A. Rosen},
  journal={J. Database Manag.},
Data perturbation is a data security technique that adds ‘noise’ to databases allowing individual record confidentiality. This technique allows users to ascertain key summary information about the data that is not distorted and does not lead to a security breach. Four bias types have been proposed which assess the effectiveness of such techniques. However, these biases only deal with simple aggregate concepts (averages, etc.) found in the database. To compete in today’s business… Expand
A systematic review on privacy-preserving distributed data mining
Combining and analysing sensitive data from multiple sources offers considerable potential for knowledge discovery. However, there are a number of issues that pose problems for such analyses,Expand
A Cryptography Based Privacy Preserving Association Rule Mining in Academic Analytics
This work highlights a simple model that implements AES encryption algorithm on target attributes before applying association rule mining algorithm, and shows that the proposed system is an easy and reliable system to preserve privacy of educational data. Expand
A Partial Optimization Approach for Privacy Preserving Frequent Itemset Mining
The authors present an approach to identify the optimal set of transactions that, if sanitized, would result in hiding sensitive patterns while reducing the accidental hiding of legitimate patterns and the damage done to the database as much as possible. Expand
A survey on statistical disclosure control and micro‐aggregation techniques for secure statistical databases
This paper surveys the fields of Statistical Disclosure Control and Micro‐Aggregation Techniques (MATs), which are both areas fundamental to the science of secure Statistical DataBases (SDBs), and represents a complete overview of the state‐of‐the‐art techniques. Expand
A survey on statistical disclosure control and micro-aggregation techniques for secure statistical databases
The paper summarizes the perturbative and non-perturbative SDC methods for micro-data protection, and it focuses on the families of MATs by formally stating the Micro-Aggregation Problem and surveying it in a comprehensive manner. Expand
Efficient privacy preservation of big data for accurate data mining
Experiments show that PABIDOT excels in execution speed, scalability, attack resistance, and accuracy in large-scale privacy- Preserving data classification when compared with two other, related privacy-preserving algorithms. Expand
Privacy-Preserving Estimation
A background on privacy-preserving data mining (PPDM) and the related field of statistical disclosure limitation (SDL) is presented and the need for a data-centric approach (DCA) to PPDM is considered. Expand
An Evaluation Framework for Privacy-Preserving Record Linkage
A general framework with normalized measures to practically evaluate and compare PPRL solutions in the face of linkage attack methods that are based on an external global dataset is proposed and the results show that the framework provides an extensive and comparative evaluation of PPRl solutions in terms of the three properties. Expand
Usability heuristics for fast crime data anonymization in resource-constrained contexts
There is considerable evidence that the concept of a three-pronged solution to addressing the issue of anonymity during crime reporting in a resource-constrained environment is promising and can further assist the law enforcement agencies to partner with third party in deriving useful crime pattern knowledge without infringing on users’ privacy. Expand
Scalable and approximate privacy-preserving record linkage
This thesis presents extensive research in PPRL, and proposes two efficient two-party techniques for private matching and classification to address the linkage quality challenge in terms of approximate matching and effective classification. Expand


A General Additive Data Perturbation Method for Database Security
This study describes a new method (General Additive Data Perturbation) that does not change relationships between attributes and when the database has a multivariate normal distribution, the new method provides maximum security and minimum bias. Expand
Accessibility, security, and accuracy in statistical databases: the case for the multiplicative fixed data perturbation approach
A comparison of different security mechanisms reveals that fixed data perturbation is preferred because it maximizes both security and accessibility, and an investigation of the different approaches to fixed dataperturbation indicates that multiplicative method best meets these criteria. Expand
A modified random perturbation method for database security
The random data perturbation (RDP) method of preserving the privacy of individual records in a statistical database is discussed. In particular, it is shown that if confidential attributes areExpand
Optimal noise addition for preserving confidentiality in multivariate data
Abstract Organizations releasing data for use in statistical studies have an ethical obligation to protect the confidentiality of individual respondents. A seemingly attractive way of ensuringExpand
A method for limiting disclosure in microdata based on random noise and transformation
Survey data is often released as microdata. Survey respondents are thus subjected to the risk of reidenti f icat ion and disclosure of confidential data, even when identi fying information such asExpand
Classification trees based on exhaustive search algorithms tend to be bi- ased towards selecting variables that afford more splits. As a result, such trees should be interpreted with caution. ThisExpand
A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms
Among decision tree algorithms with univariate splits, C4.5, IND-CART, and QUEST have the best combinations of error rate and speed, but C 4.5 tends to produce trees with twice as many leaves as those fromIND-Cart and QUEST. Expand
Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator
A new algorithm called Mersenne Twister (MT) is proposed for generating uniform pseudorandom numbers, which provides a super astronomical period of 2 and 623-dimensional equidistribution up to 32-bit accuracy, while using a working area of only 624 words. Expand
Theory and Application of the Linear Model
This book integrates the linear statistical model within the context of analysis of variance, correlation and regression, and design of experiments and is a time tested, authoritative resource for experimenters, statistical consultants, and students. Expand
UCI Repository of machine learning databases