Universal Litmus Patterns: Revealing Backdoor Attacks in CNNs

@article{Kolouri2020UniversalLP,
  title={Universal Litmus Patterns: Revealing Backdoor Attacks in CNNs},
  author={Soheil Kolouri and Aniruddha Saha and Hamed Pirsiavash and Heiko Hoffmann},
  journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2020},
  pages={298-307}
}
The unprecedented success of deep neural networks in many applications has made these networks a prime target for adversarial exploitation. In this paper, we introduce a benchmark technique for detecting backdoor attacks (aka Trojan attacks) on deep convolutional neural networks (CNNs). We introduce the concept of Universal Litmus Patterns (ULPs), which enable one to reveal backdoor attacks by feeding these universal patterns to the network and analyzing the output (i.e., classifying the… Expand
Detecting Backdoor in Deep Neural Networks via Intentional Adversarial Perturbations
TLDR
A novel backdoor detection method based on adversarial examples is proposed that has better detection performance on all the three datasets, and is more efficient than STRIP. Expand
Composite Backdoor Attack for Deep Neural Network by Mixing Existing Benign Features
With the prevalent use of Deep Neural Networks (DNNs) in many applications, security of these networks is of importance. Pre-trained DNNs may contain backdoors that are injected through poisonedExpand
Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification
TLDR
A novel deep feature space trojan attack with five characteristics: effectiveness, stealthiness, controllability, robustness and reliance on deep features is proposed and can evade state-of-the-art defense. Expand
Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks
TLDR
This paper proposes a novel defense framework Neural Attention Distillation (NAD), which utilizes a teacher network to guide the finetuning of the backdoored student network on a small clean subset of data such that the intermediate-layer attention of the student network aligns with that of the teacher network. Expand
Practical Detection of Trojan Neural Networks: Data-Limited and Data-Free Cases
TLDR
A data-limited TrojanNet detector (TND) is proposed, which can detect a TrojanNet without accessing any data samples, and it is shown that such a TND can be built by leveraging the internal response of hidden neurons, which exhibits the Trojan behavior even at random noise inputs. Expand
Poison Ink: Robust and Invisible Backdoor Attack
  • Jie zhang, Dongdong Chen, +4 authors Nenghai Yu
  • Computer Science
  • ArXiv
  • 2021
TLDR
This work proposes a robust and invisible backdoor attack called Poison Ink, which is not only general to different datasets and network architectures, but also flexible for different attack scenarios and has very strong resistance against many state-of-the-art defense techniques. Expand
Scalable Backdoor Detection in Neural Networks
TLDR
A novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types is proposed. Expand
Threat of Adversarial Attacks on Deep Learning in Computer Vision: Survey II
TLDR
A literature review of the contributions made by the computer vision community in adversarial attacks on deep learning until the advent of year 2018, which focuses on the advances in this area since 2018. Expand
Black-box Detection of Backdoor Attacks with Limited Information and Data
TLDR
A blackbox backdoor detection (B3D) method to identify backdoor attacks with only query access to the model with a gradient-free optimization algorithm to reverse-engineer the potential trigger for each class, which helps to reveal the existence of backdoor attacks. Expand
Advances in adversarial attacks and defenses in computer vision: A survey
TLDR
A literature review of the contributions made by the computer vision community in adversarial attacks on deep learning until the advent of year 2018, which focuses on the advances in this area since 2018. Expand
...
1
2
3
4
...

References

SHOWING 1-10 OF 38 REFERENCES
Backdoor Embedding in Convolutional Neural Network Models via Invisible Perturbation
TLDR
This paper proposes two approaches for generating a backdoor that is hardly perceptible yet effective in poisoning the model, and demonstrates that such attacks can be effective and achieve a high attack success rate at a small cost of model accuracy loss with a small injection rate. Expand
Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks
TLDR
Fine-pruning is evaluated, a combination of pruning and fine-tuning, and it is shown that it successfully weakens or even eliminates the backdoors, i.e., in some cases reducing the attack success rate to 0% with only a \(0.4\%\) drop in accuracy for clean (non-triggering) inputs. Expand
Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering
TLDR
This work proposes a novel approach to backdoor detection and removal for neural networks that is the first methodology capable of detecting poisonous data crafted to insert backdoors and repairing the model that does not require a verified and trusted dataset. Expand
STRIP: a defence against trojan attacks on deep neural networks
TLDR
This work builds STRong Intentional Perturbation (STRIP) based run-time trojan attack detection system and focuses on vision system, which achieves an overall false acceptance rate (FAR) of less than 1%, given a preset false rejection rate (FRR) of 1%, for different types of triggers. Expand
Universal Adversarial Perturbations
TLDR
The surprising existence of universal perturbations reveals important geometric correlations among the high-dimensional decision boundary of classifiers and outlines potential security breaches with the existence of single directions in the input space that adversaries can possibly exploit to break a classifier on most natural images. Expand
BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain
TLDR
It is shown that outsourced training introduces new security risks: an adversary can create a maliciously trained network (a backdoored neural network, or a BadNet) that has state-of-the-art performance on the user's training and validation samples, but behaves badly on specific attacker-chosen inputs. Expand
Towards Evaluating the Robustness of Neural Networks
TLDR
It is demonstrated that defensive distillation does not significantly increase the robustness of neural networks, and three new attack algorithms are introduced that are successful on both distilled and undistilled neural networks with 100% probability are introduced. Expand
Trojaning Attack on Neural Networks
TLDR
A trojaning attack on neuron networks that can be successfully triggered without affecting its test accuracy for normal input data, and it only takes a small amount of time to attack a complex neuron network model. Expand
Gotta Catch 'Em All: Using Concealed Trapdoors to Detect Adversarial Attacks on Neural Networks
TLDR
This work introduces trapdoors and describes an implementation of trapdoors using similar strategies to backdoor/Trojan attacks, and shows that by proactively injecting trapdoors into the models, this work can detect adversarial examples generated by the state of the art attacks with high detection success rate and negligible impact on normal inputs. Expand
Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks
TLDR
This work presents the first robust and generalizable detection and mitigation system for DNN backdoor attacks, and identifies multiple mitigation techniques via input filters, neuron pruning and unlearning. Expand
...
1
2
3
4
...