Protecting Data through Perturbation Techniques: The Impact on Knowledge Discovery in Databases
@article{Wilson2003ProtectingDT, title={Protecting Data through Perturbation Techniques: The Impact on Knowledge Discovery in Databases}, author={Rick L. Wilson and Peter A. Rosen}, journal={J. Database Manag.}, year={2003}, volume={14}, pages={14-26} }
Data perturbation is a data security technique that adds ‘noise’ to databases allowing individual record confidentiality. This technique allows users to ascertain key summary information about the data that is not distorted and does not lead to a security breach. Four bias types have been proposed which assess the effectiveness of such techniques. However, these biases only deal with simple aggregate concepts (averages, etc.) found in the database. To compete in today’s business…
32 Citations
Survey on Privacy-Preserving Techniques for Data Publishing
- Computer ScienceArXiv
- 2022
The main challenges raised by privacy constraints are discussed, the main approaches to handle these obstacles are described, taxonomies of privacy-preserving techniques are reviewed, theoretical analysis of existing comparative studies are provided, and multiple open issues are raised.
A systematic review on privacy-preserving distributed data mining
- Computer ScienceData Sci.
- 2021
This review identifies the consequence of the lack of standard criteria to evaluate new PPDDM methods and proposes comprehensive evaluation criteria with 10 key factors and discusses the ambiguous definitions of privacy and confusion between privacy and security in the field.
A Partial Optimization Approach for Privacy Preserving Frequent Itemset Mining
- Computer ScienceInt. J. Comput. Model. Algorithms Medicine
- 2010
The authors present an approach to identify the optimal set of transactions that, if sanitized, would result in hiding sensitive patterns while reducing the accidental hiding of legitimate patterns and the damage done to the database as much as possible.
A survey on statistical disclosure control and micro‐aggregation techniques for secure statistical databases
- Computer ScienceSoftw. Pract. Exp.
- 2010
This paper surveys the fields of Statistical Disclosure Control and Micro‐Aggregation Techniques (MATs), which are both areas fundamental to the science of secure Statistical DataBases (SDBs), and represents a complete overview of the state‐of‐the‐art techniques.
A survey on statistical disclosure control and micro-aggregation techniques for secure statistical databases
- Computer Science
- 2010
The paper summarizes the perturbative and non-perturbative SDC methods for micro-data protection, and it focuses on the families of MATs by formally stating the Micro-Aggregation Problem and surveying it in a comprehensive manner.
Privacy-Preserving Estimation
- Computer ScienceEncyclopedia of Artificial Intelligence
- 2009
A background on privacy-preserving data mining (PPDM) and the related field of statistical disclosure limitation (SDL) is presented and the need for a data-centric approach (DCA) to PPDM is considered.
An Evaluation Framework for Privacy-Preserving Record Linkage
- Computer ScienceJ. Priv. Confidentiality
- 2014
A general framework with normalized measures to practically evaluate and compare PPRL solutions in the face of linkage attack methods that are based on an external global dataset is proposed and the results show that the framework provides an extensive and comparative evaluation of PPRl solutions in terms of the three properties.
Usability heuristics for fast crime data anonymization in resource-constrained contexts
- Computer Science
- 2018
There is considerable evidence that the concept of a three-pronged solution to addressing the issue of anonymity during crime reporting in a resource-constrained environment is promising and can further assist the law enforcement agencies to partner with third party in deriving useful crime pattern knowledge without infringing on users’ privacy.
Preserving Privacy in Mining Quantitative Associations Rules
- Computer ScienceInt. J. Inf. Secur. Priv.
- 2009
A method based on discrete wavelet transform (DWT) to protect input data privacy while preserving data mining patterns for association rules and a comparison with an existing kd-tree based transform shows that the DWT-based method fares better in terms of efficiency, preserving patterns, and privacy.
An access and inference control model for time series databases
- Computer ScienceFuture Gener. Comput. Syst.
- 2019
References
SHOWING 1-10 OF 14 REFERENCES
A General Additive Data Perturbation Method for Database Security
- Computer Science
- 1999
This study describes a new method (General Additive Data Perturbation) that does not change relationships between attributes and when the database has a multivariate normal distribution, the new method provides maximum security and minimum bias.
Accessibility, security, and accuracy in statistical databases: the case for the multiplicative fixed data perturbation approach
- Computer Science
- 1995
A comparison of different security mechanisms reveals that fixed data perturbation is preferred because it maximizes both security and accessibility, and an investigation of the different approaches to fixed dataperturbation indicates that multiplicative method best meets these criteria.
A modified random perturbation method for database security
- Computer ScienceTODS
- 1994
The random data perturbation (RDP) method of preserving the privacy of individual records in a statistical database is discussed. In particular, it is shown that if confidential attributes are…
A METHOD FOR LIMITING DISCLOSURE IN MICRODATA BASED ON RANDOM NOISE AND
- Computer Science
- 2002
A new scheme for masking earnings data is developed which is a combination of random noise inoculation and transformation and the theoretical effects of masking on the regression are discussed.
SPLIT SELECTION METHODS FOR CLASSIFICATION TREES
- Computer Science
- 1997
This article presents an algorithm called QUEST that has negligible bias, which shares similarities with the FACT method, but it yields binary splits and the final tree can be selected by a direct stopping rule or by pruning.
A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms
- Computer ScienceMachine Learning
- 2004
Among decision tree algorithms with univariate splits, C4.5, IND-CART, and QUEST have the best combinations of error rate and speed, but C 4.5 tends to produce trees with twice as many leaves as those fromIND-Cart and QUEST.
Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator
- Computer Science, MathematicsTOMC
- 1998
A new algorithm called Mersenne Twister (MT) is proposed for generating uniform pseudorandom numbers, which provides a super astronomical period of 2 and 623-dimensional equidistribution up to 32-bit accuracy, while using a working area of only 624 words.
Theory and Application of the Linear Model
- Mathematics
- 1976
This book integrates the linear statistical model within the context of analysis of variance, correlation and regression, and design of experiments and is a time tested, authoritative resource for experimenters, statistical consultants, and students.