• Corpus ID: 220665637

Backdoor Attacks and Countermeasures on Deep Learning: A Comprehensive Review

  title={Backdoor Attacks and Countermeasures on Deep Learning: A Comprehensive Review},
  author={Yansong Gao and Bao Gia Doan and Zhi Zhang and Siqi Ma and Jiliang Zhang and Anmin Fu and Surya Nepal and Hyoungshick Kim},
This work provides the community with a timely comprehensive review of backdoor attacks and countermeasures on deep learning. According to the attacker's capability and affected stage of the machine learning pipeline, the attack surfaces are recognized to be wide and then formalized into six categorizations: code poisoning, outsourcing, pretrained, data collection, collaborative learning and post-deployment. Accordingly, attacks under each categorization are combed. The countermeasures are… 

Figures and Tables from this paper

ProtoVAE: A Trustworthy Self-Explainable Prototypical Variational Model

ProtoVAE is proposed, a variational autoencoder-based framework that learns class-specific prototypes in an end-to-end manner and enforces trustworthiness and diversity by regularizing the representation space and introducing an orthonormality constraint.

Use Procedural Noise to Achieve Backdoor Attack

This paper proposes a novel global backdoor trigger that is generated by procedural noise that has strong robustness for most corruption methods, which means it can be applied in reality.

Decamouflage: A Framework to Detect Image-Scaling Attacks on CNN

An image-scaling attack detection framework, Decamouflage, consisting of three independent detection methods: scaling, filtering, and steganalysis, to detect the attack through examining distinct image characteristics, which has a pre-determined detection threshold that is generic.

Decamouflage: A Framework to Detect Image-Scaling Attacks on Convolutional Neural Networks

This work presents an image-scaling attack detection framework, termed as Decamouflage, which can accurately detect image scaling attacks in both white-box and black-box settings with acceptable run-time overhead.

Don't Watch Me: A Spatio-Temporal Trojan Attack on Deep-Reinforcement-Learning-Augment Autonomous Driving

It is shown that while capturing spatio-temporal traffic features can improve the performance of DRL for different AD tasks, they suffer from Trojan attacks since they are vulnerable to Trojan attacks.

A Temporal-Pattern Backdoor Attack to Deep Reinforcement Learning

This paper explores the sequential nature of DRL and proposes a novel temporal-pattern backdoor attack to DRL, whose trigger is a set of temporal constraints on a sequence of observations rather than a single observation, and effect can be kept in a controllable duration rather than in the instant.

Towards Trustworthy Outsourced Deep Neural Networks

A new attack based on steganography is proposed that enables the server to generate wrong prediction results in a command-and-control fashion and a homomorphic encryption-based authentication scheme is designed to detect wrong predictions made by any attack.

Backdoor Pre-trained Models Can Transfer to All

A new approach to map the inputs containing triggers directly to a predefined output representation of the pre-trained NLP models, e.g., a preddefined output representation for the classification token in BERT, instead of a target label, which can introduce backdoor to a wide range of downstream tasks without any prior knowledge.

Backdoor Attacks on Federated Learning with Lottery Ticket Hypothesis

It is empirically demonstrated that Lottery Ticket models are equally vulnerable to backdoor attacks as the original dense models, and backdoor attacks can influence the structure of extracted tickets.

Backdoor Learning Curves: Explaining Backdoor Poisoning Beyond Influence Functions

This work provides a unifying framework to study the process of backdoor learning under the lens of incremental learning and influence functions and shows that the success of backdoor attacks inherently depends on the complexity of the learning algorithm, controlled by its hyperparameters.



ConFoc: Content-Focus Protection Against Trojan Attacks on Neural Networks

A novel defensive technique is proposed, in which DNNs are taught to disregard the styles of inputs and focus on their content only to mitigate the effect of triggers during the classification, which reduces the attack success rate significantly and improves the initial accuracy of the models when processing both benign and adversarial data.

Graph Backdoor

The effectiveness of GTA is demonstrated: for instance, on pre-trained, off-the-shelf GNNs, GTA attains over 99.2% attack success rate with merely less than 0.3% accuracy drop.

FaceHack: Triggering backdoored facial recognition systems using facial characteristics

This work demonstrates that specific changes to facial characteristics may also be used to trigger malicious behavior in an ML model and substantiates the undetectability of the triggers by exhaustively testing them with state-of-the-art defenses.

Backdoor Attacks to Graph Neural Networks

This work proposes a subgraph based backdoor attack to GNN for graph classification that predicts an attacker-chosen target label for a testing graph once a predefined subgraph is injected to the testing graph.

Scalable Backdoor Detection in Neural Networks

A novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types is proposed.

Blind Backdoors in Deep Learning Models

New classes of backdoors strictly more powerful than those in prior literature are demonstrated: single-pixel and physical backdoors in ImageNet models, backdoors that switch the model to a covert, privacy-violating task, and back Doors that do not require inference-time input modifications.

RAB: Provable Robustness Against Backdoor Attacks

This paper provides the first benchmark for certified robustness against backdoor attacks, theoretically proves the robustness bound for machine learning models based on this training process, proves that the bound is tight, and derives robustness conditions for Gaussian and Uniform smoothing distributions.

NNoculation: Broad Spectrum and Targeted Treatment of Backdoored DNNs

A novel two-stage defense against backdoored neural networks (BadNets) that outperforms state-of-the-art defenses NeuralCleanse and Artificial Brain Simulation that are shown to be ineffective when their restrictive assumptions are circumvented by the attacker.

Label-Consistent Backdoor Attacks

This work leverages adversarial perturbations and generative models to execute efficient, yet label-consistent, backdoor attacks, based on injecting inputs that appear plausible, yet are hard to classify, hence causing the model to rely on the (easier-to-learn) backdoor trigger.

Revealing Perceptible Backdoors, without the Training Set, via the Maximum Achievable Misclassification Fraction Statistic

This work identifies two important properties of perceptible backdoor patterns - spatial invariance and robustness - based upon which it proposes a novel detector using the maximum achievable misclassification fraction (MAMF) statistic, which outperforms other existing detectors and, coupled with an imperceptible backdoor detector, helps achieve post-training detection of all evasive backdoors.