• Corpus ID: 238531600

Adversarial Unlearning of Backdoors via Implicit Hypergradient

@article{Zeng2021AdversarialUO,
  title={Adversarial Unlearning of Backdoors via Implicit Hypergradient},
  author={Yi Zeng and Si Chen and Won Park and Zhuoqing Morley Mao and Ming Jin and R. Jia},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.03735}
}
We propose a minimax formulation for removing backdoors from a given poisoned model based on a small set of clean data. This formulation encompasses much of prior work on backdoor removal. We propose the Implicit Backdoor Adversarial Unlearning (I-BAU) algorithm to solve the minimax. Unlike previous work, which breaks down the minimax into separate inner and outer problems, our algorithm utilizes the implicit hypergradient to account for the interdependence between inner and outer optimization… 

One-shot Neural Backdoor Erasing via Adversarial Weight Masking

Adversarial Weight Masking (AWM) is proposed, a novel method capable of erasing the neural backdoors even in the one-shot setting and can largely improve the purifying effects over other state-of-the-art methods on various available training dataset sizes.

Backdoor Learning: A Survey

This article summarizes and categorizes existing backdoor attacks and defenses based on their characteristics, and provides a unified framework for analyzing poisoning-based backdoor attacks, and summarizes widely adopted benchmark datasets.

Backdoor Cleansing with Unlabeled Data

A novel defense method that does not require training labels is proposed that can effectively cleanse backdoor behaviors of a suspicious network with negligible compromise in its normal behavior and is on-par with state-of-the-art defense methods trained using labels.

Towards Understanding How Self-training Tolerates Data Backdoor Poisoning

This paper explores the potential of self-training via additional unlabeled data for mitigating backdoor attacks by leveraging strong but proper data augmentations in the self- Training pseudo-labeling stage and finds that the new self- training regime help in defending against backdoor attacks to a great extent.

Model-Contrastive Learning for Backdoor Defense

A novel two-stage backdoor defense method, named MCLDef, based on Model-Contrastive Learning (MCL), that outperforms state-of-the-art defense methods by up to 95.79% reduction in ASR, while in most cases the BA degradation can be controlled within less than 2%.

Trap and Replace: Defending Backdoor Attacks by Trapping Them into an Easy-to-Replace Subnetwork

A brand-new backdoor defense strategy, which makes it much easier to remove the harmful influence of backdoor samples from the model, and outperforms previous state-of-the-art methods by up to 20%.

Towards Understanding and Defending Input Space Trojans

A theory is proposed to explain the relationship of a model’s decision regions and Trojans: a complete and accurate Trojan corresponds to a hyperplane decision region in the input domain.

Backdoor Defense via Decoupling the Training Process

This work proposes a novel backdoor defense via decoupling the original end-to-end training process into three stages, and reveals that poisoned samples tend to cluster together in the feature space of the attacked DNN model, which is mostly due to the endto- end supervised training paradigm.

Universal Post-Training Backdoor Detection

This paper proposes a universal post-training defense that detects BAs with arbitrary types of BPs, without making any assumptions about the BP type, and proposes a novel, general approach for BA mitigation once a detection is made.

Training with More Confidence: Mitigating Injected and Natural Backdoors During Training

A novel training method is designed that forces the training to avoid generating such hyperplanes and thus remove the injected backdoors, and can outperform existing state-of-the-art defenses.

References

SHOWING 1-10 OF 62 REFERENCES

Backdoor Learning: A Survey

This article summarizes and categorizes existing backdoor attacks and defenses based on their characteristics, and provides a unified framework for analyzing poisoning-based backdoor attacks, and summarizes widely adopted benchmark datasets.

Can Adversarial Weight Perturbations Inject Neural Backdoors

This work extends the idea of "adversarial perturbations" to the space of model weights, specifically to inject backdoors in trained DNNs, which exposes a security risk of publicly available trained models.

Label-Consistent Backdoor Attacks

This work leverages adversarial perturbations and generative models to execute efficient, yet label-consistent, backdoor attacks, based on injecting inputs that appear plausible, yet are hard to classify, hence causing the model to rely on the (easier-to-learn) backdoor trigger.

DeepSweep: An Evaluation Framework for Mitigating DNN Backdoor Attacks using Data Augmentation

A systematic approach is proposed to discover the optimal policies for defending against different backdoor attacks by comprehensively evaluating 71 state-of-the-art data augmentation functions and envision this framework can be a good benchmark tool to advance future DNN backdoor studies.

Hidden Trigger Backdoor Attacks

This work proposes a novel form of backdoor attack where poisoned data look natural with correct labels and also more importantly, the attacker hides the trigger in the poisoned data and keeps the trigger secret until the test time.

WaNet - Imperceptible Warping-based Backdoor Attack

With the thriving of deep learning and the widespread practice of using pretrained networks, backdoor attacks have become an increasing security threat drawing many research interests in recent

Towards Deep Learning Models Resistant to Adversarial Attacks

This work studies the adversarial robustness of neural networks through the lens of robust optimization, and suggests the notion of security against a first-order adversary as a natural and broad security guarantee.

RAB: Provable Robustness Against Backdoor Attacks

This paper provides the first benchmark for certified robustness against backdoor attacks, theoretically proves the robustness bound for machine learning models based on this training process, proves that the bound is tight, and derives robustness conditions for Gaussian and Uniform smoothing distributions.

Backdoor Defense via Decoupling the Training Process

This work proposes a novel backdoor defense via decoupling the original end-to-end training process into three stages, and reveals that poisoned samples tend to cluster together in the feature space of the attacked DNN model, which is mostly due to the endto- end supervised training paradigm.

Invisible Backdoor Attacks Against Deep Neural Networks

An optimization framework to create covert and scattered triggers for backdoor attacks, where triggers can amplify the specific neuron activation, while being invisible to both backdoor detection methods and human inspection is designed.
...