• Corpus ID: 35426171

Certified Defenses for Data Poisoning Attacks

@article{Steinhardt2017CertifiedDF,
  title={Certified Defenses for Data Poisoning Attacks},
  author={Jacob Steinhardt and Pang Wei Koh and Percy Liang},
  journal={ArXiv},
  year={2017},
  volume={abs/1706.03691}
}
Machine learning systems trained on user-provided data are susceptible to data poisoning attacks, whereby malicious users inject false training data with the aim of corrupting the learned model. [] Key Method Our approximation relies on two assumptions: (1) that the dataset is large enough for statistical concentration between train and test error to hold, and (2) that outliers within the clean (non-poisoned) data do not have a strong effect on the model. Our bound comes paired with a candidate attack that…

Figures from this paper

Stronger Data Poisoning Attacks Break Data Sanitization Defenses
TLDR
Three new attacks that can all bypass a broad range of data sanitization defenses are developed, including commonly-used anomaly detectors based on nearest neighbors, training loss, and singular-value decomposition.
Data Poisoning Attacks on Regression Learning and Corresponding Defenses
TLDR
This research presents realistic scenarios in which data poisoning attacks threaten production systems and introduces a novel black-box attack, which is then applied to a real-word medical use-case and concludes that the proposed defence strategy effectively mitigates the considered attacks.
An Investigation of Data Poisoning Defenses for Online Learning
TLDR
This work undertake a rigorous study of defenses against data poisoning for online learning, and studies four standard defenses in a powerful threat model, and provides conditions under which they can allow or resist rapid poisoning.
Influence Based Defense Against Data Poisoning Attacks in Online Learning
TLDR
This work proposes a defense mechanism to minimize the degradation caused by the poisoned training data on a learner's model in an online setup by utilizing an influence function which is a classic technique in robust statistics.
Robustly-reliable learners under poisoning attacks
TLDR
This work shows how to achieve strong robustness guarantees in the face of data poisoning attacks across multiple axes, and provides robustly-reliable predictions, in which the predicted label is guaranteed to be correct so long as the adversary has not exceeded a given corruption budget.
Data Poisoning Attacks against Online Learning
TLDR
A systematic investigation of data poisoning attacks for online learning is initiated, and a general attack strategy is proposed, formulated as an optimization problem, that applies to both settings with some modifications.
WITCHES’ BREW: INDUSTRIAL SCALE DATA POISON-
TLDR
This work focuses on targeted poisoning attacks which cause a reclassification of an unmodified test image and as such breach model integrity, finding that this is the first poisoning method to cause targeted misclassification in modern deep networks trained from scratch on a full-sized, poisoned ImageNet dataset.
A Framework of Randomized Selection Based Certified Defenses Against Data Poisoning Attacks
TLDR
It is proved that the random selection schemes that satisfy certain conditions are robust against data poisoning attacks and the analytical form of the certified radius of bagging derived by the framework is tighter than the previous work.
Detection of Adversarial Training Examples in Poisoning Attacks through Anomaly Detection
TLDR
This paper proposes a defence mechanism to mitigate the effect of these optimal poisoning attacks based on outlier detection, and shows empirically that the adversarial examples generated by these attack strategies are quite different from genuine points, as no detectability constrains are considered to craft the attack.
Property Inference From Poisoning
TLDR
The findings suggest that poisoning attacks can boost the information leakage significantly and should be considered as a stronger threat model in sensitive applications where some of the data sources may be malicious.
...
...

References

SHOWING 1-10 OF 67 REFERENCES
ANTIDOTE: understanding and defending against poisoning of anomaly detectors
TLDR
This work proposes an antidote based on techniques from robust statistics and presents a new robust PCA-based detector that substantially reduces the effectiveness of poisoning for a variety of scenarios and indeed maintains a significantly better balance between false positives and false negatives than the original method when under attack.
Is Feature Selection Secure against Training Data Poisoning?
TLDR
The results on malware detection show that feature selection methods can be significantly compromised under attack, highlighting the need for specific countermeasures.
Support vector machines under adversarial label contamination
Poisoning Attacks against Support Vector Machines
TLDR
It is demonstrated that an intelligent adversary can, to some extent, predict the change of the SVM's decision function due to malicious input and use this ability to construct malicious data.
Systematic Poisoning Attacks on and Defenses for Machine Learning in Healthcare
TLDR
This paper presents a systematic, algorithm-independent approach for mounting poisoning attacks across a wide range of machine-learning algorithms and healthcare datasets, and establishes the effectiveness of the proposed attacks using a suite of six machine- learning algorithms and five healthcare datasets.
Learning from untrusted data
TLDR
An algorithm for robust learning in a very general stochastic optimization setting is provided that has immediate implications for robustly estimating the mean of distributions with bounded second moments, robustly learning mixtures of such distributions, and robustly finding planted partitions in random graphs.
Is data clustering in adversarial settings secure?
TLDR
It is shown that an attacker may significantly poison the whole clustering process by adding a relatively small percentage of attack samples to the input data, and that some attack samples may be obfuscated to be hidden within some existing clusters.
Practical Evasion of a Learning-Based Classifier: A Case Study
TLDR
A taxonomy for practical evasion strategies and adapt known evasion algorithms to implement specific scenarios in the authors' taxonomy is developed and a substantial drop of PDFrate's classification scores and detection accuracy is revealed after it is exposed even to simple attacks.
Stealing Machine Learning Models via Prediction APIs
TLDR
Simple, efficient attacks are shown that extract target ML models with near-perfect fidelity for popular model classes including logistic regression, neural networks, and decision trees against the online services of BigML and Amazon Machine Learning.
Paragraph: Thwarting Signature Learning by Training Maliciously
TLDR
It is shown that even a delusive adversary, whose samples are all correctly labeled, can obstruct learning, and practical attacks against learning are described, in which an adversary constructs labeled samples that prevent or severely delay generation of an accurate classifier.
...
...