RAB: Provable Robustness Against Backdoor Attacks
@article{Weber2020RABPR, title={RAB: Provable Robustness Against Backdoor Attacks}, author={Maurice Weber and Xiaojun Xu and Bojan Karlas and Ce Zhang and Bo Li}, journal={ArXiv}, year={2020}, volume={abs/2003.08904} }
Recent studies have shown that deep neural networks are highly vulnerable to adversarial attacks, including evasion and backdoor attacks. On the defense side, there have been intensive interests in provable robustness against evasion attacks, while lack of robustness guarantees against backdoor attacks. In this paper, we focus on certifying the model robustness against general threat models. We first provide a unified framework via randomized smoothing and show it can be instantiated to certify…
Figures and Tables from this paper
84 Citations
Flareon: Stealthy any2any Backdoor Injection via Poisoned Augmentation
- Computer ScienceArXiv
- 2022
Flareon is proposed, a small, stealthy, seemingly harmless code modification that targets the data augmentation pipeline with motion-based triggers and assumes prior knowledge of the victim model architecture, training data.
Identifying a Training-Set Attack's Target Using Renormalized Influence Estimation
- Computer ScienceCCS
- 2022
This work proposes the task of target identification, which determines whether a specific test instance is the target of a training-set attack, and builds on influence estimation, which quantifies each training instance's contribution to a model's prediction.
Deep Partition Aggregation: Provable Defense against General Poisoning Attacks
- Computer ScienceICLR
- 2021
Deep Partition Aggregation (DPA), a certified defense against a general poisoning threat model, uses a semi-supervised learning algorithm as its base classifier model, and SS-DPA outperforms the existing certified defense for label-flipping attacks, establishing new state-of-the-art provable defenses against poison attacks.
Backdoor Attacks and Countermeasures on Deep Learning: A Comprehensive Review
- Computer ScienceArXiv
- 2020
This work provides the community with a timely comprehensive review of backdoor attacks and countermeasures on deep learning, and presents key areas for future research on the backdoor, such as empirical security evaluations from physical trigger attacks, and more efficient and practical countermeasures are solicited.
Wild Patterns Reloaded: A Survey of Machine Learning Security against Training Data Poisoning
- ArtACM Computing Surveys
- 2023
The success of machine learning is fueled by the increasing availability of computing power and large training datasets. The training data is used to learn new models or update existing ones,…
Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2023
The goal of this work is to systematically categorize and discuss a wide range of dataset vulnerabilities and exploits, approaches for defending against these threats, and an array of open problems in this space.
Chaos Theory and Adversarial Robustness
- Computer ScienceArXiv
- 2022
Neural Networks, being susceptible to adversarial attacks, should face a strict level of scrutiny before being deployed in critical or adversarial applications. This paper uses ideas from Chaos…
Uncovering the Connection Between Differential Privacy and Certified Robustness of Federated Learning against Poisoning Attacks
- Computer ScienceArXiv
- 2022
This paper investigates both user-level and instance-level privacy of FL and proposes novel mechanisms to achieve improved instance- level privacy and proves the certified robustness of DPFL under a bounded number of adversarial users or instances.
Turning a Curse Into a Blessing: Enabling Clean-Data-Free Defenses by Model Inversion
- Computer ScienceArXiv
- 2022
An algorithmic framework that can mitigate potential security vulnerabilities in a pre-trained model when clean data from its training distribution is unavailable to the defender is introduced.
COPA: Certifying Robust Policies for Offline Reinforcement Learning against Poisoning Attacks
- Computer ScienceICLR
- 2022
This work proposes the first certification framework, COPA, to certify the number of poisoning trajectories that can be tolerated regarding different certification criteria, and proposes two certification criteria: per-state action stability and cumulative reward bound.
References
SHOWING 1-10 OF 80 REFERENCES
Certified Adversarial Robustness via Randomized Smoothing
- Computer ScienceICML
- 2019
Strong empirical results suggest that randomized smoothing is a promising direction for future research into adversarially robust classification on smaller-scale datasets where competing approaches to certified $\ell_2$ robustness are viable, smoothing delivers higher certified accuracies.
Randomized Smoothing of All Shapes and Sizes
- Computer Science, MathematicsICML
- 2020
It is shown that with only label statistics under random input perturbations, randomized smoothing cannot achieve nontrivial certified accuracy against perturbation of $\ell_p$-norm $\Omega(\min(1, d^{\frac{1}{p} - 1}{2}}))$, when the input dimension $d$ is large.
Spectral Signatures in Backdoor Attacks
- Computer Science, MathematicsNeurIPS
- 2018
Spectral signatures are identified as a new property of all known backdoor attacks, which allows tools from robust statistics to thwart the attacks and is demonstrated the efficacy of these signatures in detecting and removing poisoned examples on real image sets and state of the art neural network architectures.
Certified Robustness to Adversarial Examples with Differential Privacy
- Computer Science2019 IEEE Symposium on Security and Privacy (SP)
- 2019
This paper presents the first certified defense that both scales to large networks and datasets and applies broadly to arbitrary model types, based on a novel connection between robustness against adversarial examples and differential privacy, a cryptographically-inspired privacy formalism.
Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning
- Computer ScienceArXiv
- 2017
This work considers a new type of attacks, called backdoor attacks, where the attacker's goal is to create a backdoor into a learning-based authentication system, so that he can easily circumvent the system by leveraging the backdoor.
Deep Learning with Differential Privacy
- Computer ScienceCCS
- 2016
This work develops new algorithmic techniques for learning and a refined analysis of privacy costs within the framework of differential privacy, and demonstrates that deep neural networks can be trained with non-convex objectives, under a modest privacy budget, and at a manageable cost in software complexity, training efficiency, and model quality.
Certified Robustness to Label-Flipping Attacks via Randomized Smoothing
- Computer Science, MathematicsICML
- 2020
This work presents a unifying view of randomized smoothing over arbitrary functions, and uses this novel characterization to propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering
- Computer ScienceSafeAI@AAAI
- 2019
This work proposes a novel approach to backdoor detection and removal for neural networks that is the first methodology capable of detecting poisonous data crafted to insert backdoors and repairing the model that does not require a verified and trusted dataset.
Backdoor Attacks on Black-Box Ciphers Exploiting Low-Entropy Plaintexts
- Computer Science, MathematicsACISP
- 2003
A new design is proposed that eliminates the need for known plaintext entirely and employs "data compression" as a basic tool for generating a hidden information channel, highlighting the need to only encrypt compressed strings when a block cipher with a secret design must be used.
BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain
- Computer ScienceArXiv
- 2017
It is shown that outsourced training introduces new security risks: an adversary can create a maliciously trained network (a backdoored neural network, or a BadNet) that has state-of-the-art performance on the user's training and validation samples, but behaves badly on specific attacker-chosen inputs.