Curriculum Adversarial Training

@article{Cai2018CurriculumAT,
  title={Curriculum Adversarial Training},
  author={Qi-Zhi Cai and Min Du and Chang Liu and Dawn Xiaodong Song},
  journal={ArXiv},
  year={2018},
  volume={abs/1805.04807}
}
Recently, deep learning has been applied to many security-sensitive applications, such as facial authentication. The existence of adversarial examples hinders such applications. The state-of-the-art result on defense shows that adversarial training can be applied to train a robust model on MNIST against adversarial examples; but it fails to achieve a high empirical worst-case accuracy on a more complex task, such as CIFAR-10 and SVHN. In our work, we propose curriculum adversarial training… 

Figures and Tables from this paper

Efficient Adversarial Training With Transferable Adversarial Examples
TLDR
This paper shows that there is high transferability between models from neighboring epochs in the same training process, i.e., adversarial examples from one epoch continue to be adversarial in subsequent epochs, and proposes a novel method, Adversarial Training with Transferable Adversaria Examples (ATTA), that can enhance the robustness of trained models and greatly improve the training efficiency by accumulating adversarial perturbations through epochs.
Is PGD-Adversarial Training Necessary? Alternative Training via a Soft-Quantization Network with Noisy-Natural Samples Only
TLDR
Extensive empirical evaluations on standard datasets show that the proposed models are comparable to PGD-adversarially-trained models under PGD and BPDA attacks, and for the first time fine-tunes a robust Imagenet model within only two days.
Confidence-Calibrated Adversarial Training: Towards Robust Models Generalizing Beyond the Attack Used During Training
TLDR
It is shown that CCAT preserves better the accuracy of normal training while robustness against adversarial examples is achieved via confidence thresholding, and in strong contrast to adversarial training, the robustness of CCAT generalizes to larger perturbations and other threat models, not encountered during training.
Guided Interpolation for Adversarial Training
TLDR
The guided interpolation framework (GIF) is proposed: in each epoch, the GIF employs the previous epoch’s meta information to guide the data's interpolation, which mitigates the model's linear behavior between classes and encourages the model to predict invariantly in the cluster of each class.
CONFIDENCE-CALIBRATED ADVERSARIAL TRAINING and Detection: MORE ROBUST MODELS GENERALIZ-
TLDR
Confidence-calibrated adversarial training (CCAT) is introduced where the key idea is to enforce that the confidence on adversarial examples decays with their distance to the attacked examples, and the robustness of CCAT generalizes to larger perturbations and other threat models, not encountered during training.
Improving Adversarial Robustness Through Progressive Hardening
TLDR
Adversarial Training with Early Stopping with ATES stabilizes network training even for a large perturbation norm and allows the network to operate at a better clean accuracy versus robustness trade-off curve compared to AT.
CONFIDENCE-CALIBRATED ADVERSARIAL TRAINING
TLDR
Confidence-calibrated adversarial training (CCAT) is introduced where the key idea is to enforce that the confidence on adversarial examples decays with their distance to the attacked examples, and the robustness of CCAT generalizes to larger perturbations and other threat models, not encountered during training.
CE-based white-box adversarial attacks will not work using super-fitting
TLDR
This paper mathematically proves the ef-fectiveness of super-fitting and enables the model to reach this state quickly by minimizing unrelated category scores (MUCS) and can make the trained model obtain the highest adversarial robustness.
Understanding and Increasing Efficiency of Frank-Wolfe Adversarial Training
TLDR
A theoretical framework for adversarial training with FW optimization ( FW-AT) is developed that reveals a geometric connection between the loss landscape and the distortion ofℓ ∞ FW attacks (the attack’s ℓ 2 norm) and analytically shows that high distortion of FW attacks is equivalent to small gradient variation along the attack path.
Calibrated Adversarial Training
TLDR
The Calibrated Adversarial Training is presented, a method that reduces the adverse effects of semantic perturbations in adversarial training and produces pixel-level adaptations to the perturbation based on novel calibrated robust error.
...
...

References

SHOWING 1-10 OF 44 REFERENCES
Adversarial Machine Learning at Scale
TLDR
This research applies adversarial training to ImageNet and finds that single-step attacks are the best for mounting black-box attacks, and resolution of a "label leaking" effect that causes adversarially trained models to perform better on adversarial examples than on clean examples.
A General Retraining Framework for Scalable Adversarial Classification
TLDR
It is shown that, under natural conditions, the retraining framework minimizes an upper bound on optimal adversarial risk, and how to extend this result to account for approximations of evasion attacks.
Towards Evaluating the Robustness of Neural Networks
TLDR
It is demonstrated that defensive distillation does not significantly increase the robustness of neural networks, and three new attack algorithms are introduced that are successful on both distilled and undistilled neural networks with 100% probability are introduced.
Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples
TLDR
This work introduces the first practical demonstration that cross-model transfer phenomenon enables attackers to control a remotely hosted DNN with no access to the model, its parameters, or its training data, and introduces the attack strategy of fitting a substitute model to the input-output pairs in this manner, then crafting adversarial examples based on this auxiliary model.
Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks
TLDR
The study shows that defensive distillation can reduce effectiveness of sample creation from 95% to less than 0.5% on a studied DNN, and analytically investigates the generalizability and robustness properties granted by the use of defensive Distillation when training DNNs.
Thermometer Encoding: One Hot Way To Resist Adversarial Examples
TLDR
A simple modification to standard neural network ar3 chitectures, thermometer encoding is proposed, which significantly increases the robustness of the network to adversarial examples, and the proper ties of these networks are explored, providing evidence that thermometer encodings help neural networks to find more-non-linear decision boundaries.
Provable defenses against adversarial examples via the convex outer adversarial polytope
TLDR
A method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations, and it is shown that the dual problem to this linear program can be represented itself as a deep network similar to the backpropagation network, leading to very efficient optimization approaches that produce guaranteed bounds on the robust loss.
Attacking the Madry Defense Model with L1-based Adversarial Examples
TLDR
The experimental results demonstrate that by relaxing the constraint of the competition, the elastic-net attack to deep neural networks (EAD) can generate transferable adversarial examples which, despite their high average $L_\infty$ distortion, have minimal visual distortion.
Delving into Transferable Adversarial Examples and Black-box Attacks
TLDR
This work is the first to conduct an extensive study of the transferability over large models and a large scale dataset, and it is also theFirst to study the transferabilities of targeted adversarial examples with their target labels.
Explaining and Harnessing Adversarial Examples
TLDR
It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets.
...
...