Corpus ID: 207847600

Certified Data Removal from Machine Learning Models

@inproceedings{Guo2020CertifiedDR,
  title={Certified Data Removal from Machine Learning Models},
  author={Chuan Guo and T. Goldstein and Awni Y. Hannun and L. V. D. Maaten},
  booktitle={ICML},
  year={2020}
}
Good data stewardship requires removal of data at the request of the data's owner. This raises the question if and how a trained machine-learning model, which implicitly stores information about its training data, should be affected by such a removal request. Is it possible to "remove" data from a machine-learning model? We study this problem by defining certified removal: a very strong theoretical guarantee that a model from which data is removed cannot be distinguished from a model that never… Expand

Figures, Tables, and Topics from this paper

Machine Unlearning for Random Forests
TLDR
Data removal-enabled (DaRE) forests are introduced, a variant of random forests that enables the removal of training data with minimal retraining, and are found to delete data orders of magnitude faster than retraining from scratch while sacrificing little to no predictive power. Expand
Machine Unlearning of Features and Labels
Removing information from a machine learning model is a non-trivial task that requires to partially revert the training process. This task is unavoidable when sensitive data, such as credit cardExpand
DART: Data Addition and Removal Trees
TLDR
This paper introduces DART, a variant of random forests that supports adding and removing training data with minimal retraining, and finds that DART is orders of magnitude faster than retraining from scratch while sacrificing very little in terms of predictive performance. Expand
Forgetting Outside the Box: Scrubbing Deep Networks of Information Accessible from Input-Output Observations
We describe a procedure for removing dependency on a cohort of training data from a trained deep network that improves upon and generalizes previous methods to different readout functions and can beExpand
Certifiable Machine Unlearning for Linear Models
TLDR
An experimental study of the three stateof-the-art approximate unlearning methods for linear models and the trade-offs between efficiency, effectiveness and certifiability offered by each method is presented. Expand
Certifiable Machine Unlearning for Linear Models [Experiment, Analysis & Benchmark Papers]
ABSTRACT Machine unlearning is the task of updating machine learning (ML) models after a subset of the training data they were trained on is deleted. Methods for the task are desired to combineExpand
Approximate Data Deletion from Machine Learning Models: Algorithms and Evaluations
TLDR
This work evaluates several approaches for approximate data deletion from trained models and proposes a new method with linear dependence on the feature dimension $d$, a significant gain over all existing methods which all have superlinear time dependent on the dimension. Expand
Mixed-Privacy Forgetting in Deep Networks
TLDR
This work introduces a novel notion of forgetting in mixed-privacy setting, where a “core” subset of the training samples does not need to be forgotten, and shows that the method allows forgetting without having to trade off the model accuracy. Expand
Adaptive Machine Unlearning
TLDR
This paper shows in theory how prior work for non-convex models fails against adaptive deletion sequences, and uses this intuition to design a practical attack against the SISA algorithm of Bourtoule et al. Expand
Amnesiac Machine Learning
TLDR
Two efficient methods that address the question of how a model owner or data holder may delete personal data from models in such a way that they may not be vulnerable to model inversion and membership inference attacks while maintaining model efficacy are presented. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 35 REFERENCES
Deep Learning with Differential Privacy
TLDR
This work develops new algorithmic techniques for learning and a refined analysis of privacy costs within the framework of differential privacy, and demonstrates that deep neural networks can be trained with non-convex objectives, under a modest privacy budget, and at a manageable cost in software complexity, training efficiency, and model quality. Expand
Differentially Private Empirical Risk Minimization
TLDR
This work proposes a new method, objective perturbation, for privacy-preserving machine learning algorithm design, and shows that both theoretically and empirically, this method is superior to the previous state-of-the-art, output perturbations, in managing the inherent tradeoff between privacy and learning performance. Expand
Differential Privacy
TLDR
A general impossibility result is given showing that a formalization of Dalenius' goal along the lines of semantic security cannot be achieved, which suggests a new measure, differential privacy, which, intuitively, captures the increased risk to one's privacy incurred by participating in a database. Expand
Mask R-CNN
TLDR
This work presents a conceptually simple, flexible, and general framework for object instance segmentation, which extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Expand
Pyramid Scene Parsing Network
TLDR
This paper exploits the capability of global context information by different-region-based context aggregation through the pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet) to produce good quality results on the scene parsing task. Expand
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
TLDR
I3D models considerably improve upon the state-of-the-art in action classification, reaching 80.2% on HMDB-51 and 97.9% on UCF-101 after pre-training on Kinetics, and a new Two-Stream Inflated 3D Conv net that is based on 2D ConvNet inflation is introduced. Expand
Towards Universal Paraphrastic Sentence Embeddings
TLDR
This work considers the problem of learning general-purpose, paraphrastic sentence embeddings based on supervision from the Paraphrase Database, and compares six compositional architectures, finding that the most complex architectures, such as long short-term memory (LSTM) recurrent neural networks, perform best on the in-domain data. Expand
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
TLDR
This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features. Expand
Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks
TLDR
The Tree-LSTM is introduced, a generalization of LSTMs to tree-structured network topologies that outperform all existing systems and strong LSTM baselines on two tasks: predicting the semantic relatedness of two sentences and sentiment classification. Expand
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks. Expand
...
1
2
3
4
...