Adversarial Examples: Attacks and Defenses for Deep Learning

  title={Adversarial Examples: Attacks and Defenses for Deep Learning},
  author={Xiaoyong Yuan and Pan He and Qile Zhu and Xiaolin Li},
  journal={IEEE Transactions on Neural Networks and Learning Systems},
With rapid progress and significant successes in a wide spectrum of applications, deep learning is being applied in many safety-critical environments. However, deep neural networks (DNNs) have been recently found vulnerable to well-designed input samples called adversarial examples. Adversarial perturbations are imperceptible to human but can easily fool DNNs in the testing/deploying stage. The vulnerability to adversarial examples becomes one of the major risks for applying DNNs in safety… 

Adversarial Examples in Deep Neural Networks: An Overview

This chapter overviews various theories behind the existence of adversarial examples as well as theories that consider the relation between the generalization error and adversarial robustness.

Explainable AI for Inspecting Adversarial Attacks on Deep Neural Networks

This paper reviewed the recent findings in adversarial attacks anddefense strategies and analyzed the effects of attacks and defense strategies applied, using the local and global analyzing methods from the family of explainable artificial intelligence.

Adversarial Examples in Modern Machine Learning: A Review

An extensive coverage of machine learning models in the visual domain is provided, furnishing the reader with an intuitive understanding of the mechanics of adversarial attack and defense mechanisms and enlarging the community of researchers studying this fundamental set of problems.

Defending Adversarial Examples by Negative Correlation Ensemble

A new ensemble defense approach named the Negative Correlation Ensemble (NCEn), which achieves compelling results by introducing gradient directions and gradient magnitudes of each member in the ensemble negatively correlated and at the same time, reducing the transferability of adversarial examples among them.

Adversarial Perturbation Defense on Deep Neural Networks

A comprehensive survey on classical and state-of-the-art defense methods against adversarial perturbations is provided by illuminating their main concepts, in-depth algorithms, and fundamental hypotheses regarding the origin of adversarialperturbations.

ROOM: Adversarial Machine Learning Attacks Under Real-Time Constraints

ROOM is proposed, a novel Real-time Online-Offline attack construction Model where an offline component serves to warm up the online algorithm, making it possible to generate highly successful attacks under time constraints, and can achieve high attack success rates under real-time constraints.

Latent Adversarial Defence with Boundary-guided Generation

The proposed LAD method improves the robustness of a DNN model through adversarial training on generated adversarial examples through adding perturbations to latent features along the normal of the decision boundary which is constructed by an SVM with an attention mechanism.

Secure machine learning against adversarial samples at test time

This paper proposes a new iterative adversarial retraining approach to robustify the model and to reduce the effectiveness of adversarial inputs on DNN models, and develops a parallel implementation that makes the proposed approach scalable for large datasets and complex models.

Complete Defense Framework to Protect Deep Neural Networks against Adversarial Examples

A comprehensive defense framework to protect DNNs against adversarial examples by ensemble the detectors, a deep Residual Generative Network (ResGN), and an adversarially trained targeted network, to construct a complete defense framework.

The Defense of Adversarial Example with Conditional Generative Adversarial Networks

An image-to-image translation model to defend against adversarial examples based on a conditional generative adversarial network which consists of a generator and a discriminator and can map the adversarial images to the clean images, which are then fed to the target deep learning model.



MagNet: A Two-Pronged Defense against Adversarial Examples

MagNet, a framework for defending neural network classifiers against adversarial examples, is proposed and it is shown empirically that MagNet is effective against the most advanced state-of-the-art attacks in blackbox and graybox scenarios without sacrificing false positive rate on normal examples.

Boosting Adversarial Attacks with Momentum

A broad class of momentum-based iterative algorithms to boost adversarial attacks by integrating the momentum term into the iterative process for attacks, which can stabilize update directions and escape from poor local maxima during the iterations, resulting in more transferable adversarial examples.

Provably Minimally-Distorted Adversarial Examples

It is demonstrated that one of the recent ICLR defense proposals, adversarial retraining, provably succeeds at increasing the distortion required to construct adversarial examples by a factor of 4.2.

Delving into Transferable Adversarial Examples and Black-box Attacks

This work is the first to conduct an extensive study of the transferability over large models and a large scale dataset, and it is also theFirst to study the transferabilities of targeted adversarial examples with their target labels.

Towards Robust Detection of Adversarial Examples

This paper presents a novel training procedure and a thresholding test strategy, towards robust detection of adversarial examples, and proposes to minimize the reverse cross-entropy (RCE), which encourages a deep network to learn latent representations that better distinguish adversarialExamples from normal ones.

Delving into adversarial attacks on deep policies

This paper presents a novel method for reducing the number of times adversarial examples need to be injected for a successful attack, based on the value function, and explores how re-training on random noise and FGSM perturbations affects the resilience against adversarialExamples.

Adversarial Attacks on Neural Network Policies

This work shows existing adversarial example crafting techniques can be used to significantly degrade test-time performance of trained policies, even with small adversarial perturbations that do not interfere with human perception.

Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks

The study shows that defensive distillation can reduce effectiveness of sample creation from 95% to less than 0.5% on a studied DNN, and analytically investigates the generalizability and robustness properties granted by the use of defensive Distillation when training DNNs.

Ground-Truth Adversarial Examples

Ground truths are constructed: adversarial examples with a provably-minimal distance from a given input point that can serve to assess the effectiveness of attack techniques and also of defense techniques, by computing the distance to the ground truths before and after the defense is applied, and measuring the improvement.

Adversarial Machine Learning at Scale

This research applies adversarial training to ImageNet and finds that single-step attacks are the best for mounting black-box attacks, and resolution of a "label leaking" effect that causes adversarially trained models to perform better on adversarial examples than on clean examples.