• Corpus ID: 239998081

Adversarial Neuron Pruning Purifies Backdoored Deep Models

@article{Wu2021AdversarialNP,
  title={Adversarial Neuron Pruning Purifies Backdoored Deep Models},
  author={Dongxian Wu and Yisen Wang},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.14430}
}
  • Dongxian Wu, Yisen Wang
  • Published 27 October 2021
  • Computer Science
  • ArXiv
As deep neural networks (DNNs) are growing larger, their requirements for computational resources become huge, which makes outsourcing training more popular. Training in a third-party platform, however, may introduce potential risks that a malicious trainer will return backdoored DNNs, which behave normally on clean samples but output targeted misclassifications whenever a trigger appears at the test time. Without any knowledge of the trigger, it is difficult to distinguish or recover benign… 
Anti-Backdoor Learning: Training Clean Models on Poisoned Data
TLDR
This paper introduces the concept of anti-backdoor learning, aiming to train clean models given backdoor-poisoned data, and proposes a general learning scheme, Anti-Backdoor Learning (ABL), to automatically prevent backdoor attacks during training.
Better Safe Than Sorry: Preventing Delusive Adversaries with Adversarial Training
TLDR
It is shown that minimizing adversarial risk on the perturbed data is equivalent to optimizing an upper bound of natural risk onThe original data, which implies that adversarial training can serve as a principled defense against delusive attacks.

References

SHOWING 1-10 OF 49 REFERENCES
Label-Consistent Backdoor Attacks
TLDR
This work leverages adversarial perturbations and generative models to execute efficient, yet label-consistent, backdoor attacks, based on injecting inputs that appear plausible, yet are hard to classify, hence causing the model to rely on the (easier-to-learn) backdoor trigger.
Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks
TLDR
Fine-pruning is evaluated, a combination of pruning and fine-tuning, and it is shown that it successfully weakens or even eliminates the backdoors, i.e., in some cases reducing the attack success rate to 0% with only a \(0.4\%\) drop in accuracy for clean (non-triggering) inputs.
Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks
TLDR
This paper proposes a novel defense framework Neural Attention Distillation (NAD), which utilizes a teacher network to guide the finetuning of the backdoored student network on a small clean subset of data such that the intermediate-layer attention of the student network aligns with that of the teacher network.
Backdoor Embedding in Convolutional Neural Network Models via Invisible Perturbation
TLDR
This paper proposes two approaches for generating a backdoor that is hardly perceptible yet effective in poisoning the model, and demonstrates that such attacks can be effective and achieve a high attack success rate at a small cost of model accuracy loss with a small injection rate.
NIC: Detecting Adversarial Samples with Neural Network Invariant Checking
TLDR
This paper analyzes the internals of DNN models under various attacks and identifies two common exploitation channels: the provenance channel and the activation value distribution channel, and proposes a novel technique to extract DNN invariants and use them to perform runtime adversarial sample detection.
BadNets: Evaluating Backdooring Attacks on Deep Neural Networks
TLDR
It is shown that the outsourced training introduces new security risks: an adversary can create a maliciously trained network (a backdoored neural network, or a BadNet) that has the state-of-the-art performance on the user's training and validation samples but behaves badly on specific attacker-chosen inputs.
Anti-Backdoor Learning: Training Clean Models on Poisoned Data
TLDR
This paper introduces the concept of anti-backdoor learning, aiming to train clean models given backdoor-poisoned data, and proposes a general learning scheme, Anti-Backdoor Learning (ABL), to automatically prevent backdoor attacks during training.
Input-Aware Dynamic Backdoor Attack
TLDR
A novel backdoor attack technique in which the triggers vary from input to input, and an input-aware trigger generator driven by diversity loss is implemented, making backdoor verification impossible.
Defending Neural Backdoors via Generative Distribution Modeling
TLDR
This work proposes max-entropy staircase approximator (MESA) for high-dimensional sampling-free generative modeling and uses it to recover the trigger distribution and develops a defense technique to remove the triggers from the backdoored model.
Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning
TLDR
This work considers a new type of attacks, called backdoor attacks, where the attacker's goal is to create a backdoor into a learning-based authentication system, so that he can easily circumvent the system by leveraging the backdoor.
...
1
2
3
4
5
...