STRIP: a defence against trojan attacks on deep neural networks

@article{Gao2019STRIPAD,
  title={STRIP: a defence against trojan attacks on deep neural networks},
  author={Yansong Gao and Chang Xu and Derui Wang and Shiping Chen and Damith Chinthana Ranasinghe and Surya Nepal},
  journal={Proceedings of the 35th Annual Computer Security Applications Conference},
  year={2019}
}
  • Yansong Gao, Chang Xu, S. Nepal
  • Published 18 February 2019
  • Computer Science
  • Proceedings of the 35th Annual Computer Security Applications Conference
A recent trojan attack on deep neural network (DNN) models is one insidious variant of data poisoning attacks. Trojan attacks exploit an effective backdoor created in a DNN model by leveraging the difficulty in interpretability of the learned model to misclassify any inputs signed with the attacker's chosen trojan trigger. Since the trojan trigger is a secret guarded and exploited by the attacker, detecting such trojan inputs is a challenge, especially at run-time when models are in active… 
DeepCleanse: Input Sanitization Framework Against Trojan Attacks on Deep Neural Network Systems
TLDR
This paper proposes a novel idea to neutralize backdoor attacks by sanitizing inputs to Deep Neural Networks in the input domain of vision tasks called DeepCleanse, and achieves dramatic reductions in the attack success rates of backdoor attack.
An Overview of Backdoor Attacks Against Deep Neural Networks and Possible Defences
TLDR
The goal of this overview paper is to review the works published until now, classifying the different types of attacks and defences proposed so far, based on the amount of control that the attacker has on the training process, and the capability of the defender to verify the integrity of the data used for training, and to monitor the operations of the DNN at training and test time.
Trojan Signatures in DNN Weights
TLDR
This paper presents the first ultra light-weight and highly effective trojan detection method that does not require access to the training/test data, does not involve any expensive computations, and makes no assumptions on the nature of the trojan trigger.
Composite Backdoor Attack for Deep Neural Network by Mixing Existing Benign Features
With the prevalent use of Deep Neural Networks (DNNs) in many applications, security of these networks is of importance. Pre-trained DNNs may contain backdoors that are injected through poisoned
Detecting Trojan Attacks on Deep Neural Networks
  • Juhi Singh, V. Sharmila
  • Computer Science
    2020 4th International Conference on Computer, Communication and Signal Processing (ICCCSP)
  • 2020
TLDR
A Proof of Concept method to detect Trojan attacks on the Deep Neural Network is presented, applying SHAP as defense against such attacks, known for its unique explanation for model predictions.
Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification
TLDR
A novel deep feature space trojan attack with five characteristics: effectiveness, stealthiness, controllability, robustness and reliance on deep features is proposed and can evade state-of-the-art defense.
Neural Network Trojans Analysis and Mitigation from the Input Domain
TLDR
A theory is proposed to explain the relationship of a model’s decision regions and Trojans: a complete and accurate Trojan corresponds to a hyperplane decision region in the input domain.
Test-Time Detection of Backdoor Triggers for Poisoned Deep Neural Networks
TLDR
An “in-flight” defense against backdoor attacks on image classification that detects use of a backdoor trigger at test-time; and infers the class of origin (source class) for a detected trigger example.
Practical Detection of Trojan Neural Networks: Data-Limited and Data-Free Cases
TLDR
A data-limited TrojanNet detector (TND) is proposed, which can detect a TrojanNet without accessing any data samples, and it is shown that such a TND can be built by leveraging the internal response of hidden neurons, which exhibits the Trojan behavior even at random noise inputs.
DeepCleanse: A Black-box Input SanitizationFramework Against Backdoor Attacks on DeepNeural Networks
TLDR
To the best of the knowledge, this is the first method in backdoor defense that works in black-box setting capable of sanitizing and restoring trojaned input that neither requires costly ground-truth labeled data nor anomaly detection.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 40 REFERENCES
Trojaning Attack on Neural Networks
TLDR
A trojaning attack on neuron networks that can be successfully triggered without affecting its test accuracy for normal input data, and it only takes a small amount of time to attack a complex neuron network model.
Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering
TLDR
This work proposes a novel approach to backdoor detection and removal for neural networks that is the first methodology capable of detecting poisonous data crafted to insert backdoors and repairing the model that does not require a verified and trusted dataset.
Adversary Resistant Deep Neural Networks with an Application to Malware Detection
TLDR
This work proposes a new adversary resistant technique that obstructs attackers from constructing impactful adversarial samples by randomly nullifying features within data vectors and theoretically validate the robustness of the technique, and empirically show that the technique significantly boosts DNN robustness to adversarialamples while maintaining high accuracy in classification.
Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks
TLDR
This work presents the first robust and generalizable detection and mitigation system for DNN backdoor attacks, and identifies multiple mitigation techniques via input filters, neuron pruning and unlearning.
SentiNet: Detecting Physical Attacks Against Deep Learning Systems
TLDR
This work demonstrates the effectiveness of SentiNet on three different attacks— i.e., adversarial examples, data poisoning attacks, and trojaned networks—that have large variations in deployment mechanisms, and shows that the defense is able to achieve very competitive performance metrics for all three threats, even against strong adaptive adversaries with full knowledge ofSentiNet.
BadNets: Evaluating Backdooring Attacks on Deep Neural Networks
TLDR
It is shown that the outsourced training introduces new security risks: an adversary can create a maliciously trained network (a backdoored neural network, or a BadNet) that has the state-of-the-art performance on the user's training and validation samples but behaves badly on specific attacker-chosen inputs.
Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks
TLDR
Fine-pruning is evaluated, a combination of pruning and fine-tuning, and it is shown that it successfully weakens or even eliminates the backdoors, i.e., in some cases reducing the attack success rate to 0% with only a \(0.4\%\) drop in accuracy for clean (non-triggering) inputs.
Neural Trojans
TLDR
This work shows that embedding hidden malicious functionality, i.e neural Trojans, into the neural IP is an effective attack and provides three mitigation techniques: input anomaly detection, re-training, and input preprocessing.
SentiNet: Detecting Localized Universal Attacks Against Deep Learning Systems
TLDR
This work demonstrates the effectiveness of SentiNet on three different attacks-i.e., data poisoning attacks, trojaned networks, and adversarial patches (including physically realizable attacks)-and shows that the defense is able to achieve very competitive performance metrics for all three threats.
Backdoor Embedding in Convolutional Neural Network Models via Invisible Perturbation
TLDR
This paper proposes two approaches for generating a backdoor that is hardly perceptible yet effective in poisoning the model, and demonstrates that such attacks can be effective and achieve a high attack success rate at a small cost of model accuracy loss with a small injection rate.
...
1
2
3
4
...