• Corpus ID: 53717538

Strength in Numbers: Trading-off Robustness and Computation via Adversarially-Trained Ensembles

@article{Grefenstette2018StrengthIN,
  title={Strength in Numbers: Trading-off Robustness and Computation via Adversarially-Trained Ensembles},
  author={Edward Grefenstette and Robert Stanforth and Brendan O'Donoghue and Jonathan Uesato and Grzegorz Swirszcz and Pushmeet Kohli},
  journal={ArXiv},
  year={2018},
  volume={abs/1811.09300}
}
While deep learning has led to remarkable results on a number of challenging problems, researchers have discovered a vulnerability of neural networks in adversarial settings, where small but carefully chosen perturbations to the input can make the models produce extremely inaccurate outputs. [] Key Result Crucially, we show that it is the adversarial training of the ensemble, rather than the ensembling of adversarially trained models, which provides robustness.

Figures and Tables from this paper

Adversarial Feature Stacking for Accurate and Robust Predictions
TLDR
An Adversarial Feature Stacking (AFS) model is proposed that can jointly take advantage of features with varied levels of robustness and accuracy, thus significantly alleviating the aforementioned trade-off between accuracy and robustness.
Confidence-Calibrated Adversarial Training: Towards Robust Models Generalizing Beyond the Attack Used During Training
TLDR
It is shown that CCAT preserves better the accuracy of normal training while robustness against adversarial examples is achieved via confidence thresholding, and in strong contrast to adversarial training, the robustness of CCAT generalizes to larger perturbations and other threat models, not encountered during training.
CONFIDENCE-CALIBRATED ADVERSARIAL TRAINING and Detection: MORE ROBUST MODELS GENERALIZ-
TLDR
Confidence-calibrated adversarial training (CCAT) is introduced where the key idea is to enforce that the confidence on adversarial examples decays with their distance to the attacked examples, and the robustness of CCAT generalizes to larger perturbations and other threat models, not encountered during training.
PARL: Enhancing Diversity of Ensemble Networks to Resist Adversarial Attacks via Pairwise Adversarially Robust Loss Function
TLDR
This paper attempts to develop a new ensemble methodology that constructs multiple diverse classifiers using a Pairwise Adversarially Robust Loss (PARL) function during the training procedure and evaluates the robustness in the presence of white-box attacks.
CONFIDENCE-CALIBRATED ADVERSARIAL TRAINING
TLDR
Confidence-calibrated adversarial training (CCAT) is introduced where the key idea is to enforce that the confidence on adversarial examples decays with their distance to the attacked examples, and the robustness of CCAT generalizes to larger perturbations and other threat models, not encountered during training.
Fixing Data Augmentation to Improve Adversarial Robustness
TLDR
It is demonstrated that, contrary to previous findings, when combined with model weight averaging, data augmentation can significantly boost robust accuracy and state-of-the-art generative models can be leveraged to artificially increase the size of the training set and improve adversarial robustness.
Relating Adversarially Robust Generalization to Flat Minima
TLDR
This paper proposes average- and worst-case metrics to measure flatness in the robust loss landscape and shows a correlation between good robust generalization and flatness, i.e., whether robust loss changes significantly when perturbing weights.
Confidence-Calibrated Adversarial Training: Generalizing to Unseen Attacks
TLDR
The confidence-calibrated adversarial training (CCAT) tackles this problem by biasing the model towards low confidence predictions on adversarial examples, allowing to reject examples with low confidence, which generalizes beyond the threat model employed during training.
Robust Overfitting may be mitigated by properly learned smoothening
TLDR
Two empirical means to inject more learned smoothening during adversarially robust training of deep networks are investigated: one leveraging knowledge distillation and self-training to smooth the logits, the other performing stochastic weight averaging (Izmailov et al., 2018) to Smooth the weights.
Data Augmentation Can Improve Robustness
TLDR
It is demonstrated that, contrary to previous findings, when combined with model weight averaging, data augmentation can significantly boost robust accuracy and, furthermore, various data augmentations techniques are compared and it is observed that spatial composition techniques work best for adversarial training.
...
...

References

SHOWING 1-10 OF 36 REFERENCES
Ensemble Adversarial Training: Attacks and Defenses
TLDR
This work finds that adversarial training remains vulnerable to black-box attacks, where perturbations computed on undefended models are transferred to a powerful novel single-step attack that escapes the non-smooth vicinity of the input data via a small random step.
Towards Deep Learning Models Resistant to Adversarial Attacks
TLDR
This work studies the adversarial robustness of neural networks through the lens of robust optimization, and suggests the notion of security against a first-order adversary as a natural and broad security guarantee.
Ensemble Methods as a Defense to Adversarial Perturbations Against Deep Neural Networks
TLDR
It is empirically shown that ensemble methods not only improve the accuracy of neural networks on test data but also increase their robustness against adversarial perturbations.
Towards Evaluating the Robustness of Neural Networks
TLDR
It is demonstrated that defensive distillation does not significantly increase the robustness of neural networks, and three new attack algorithms are introduced that are successful on both distilled and undistilled neural networks with 100% probability are introduced.
Mitigating adversarial effects through randomization
TLDR
This paper proposes to utilize randomization at inference time to mitigate adversarial effects, and uses two randomization operations: random resizing, which resizes the input images to a random size, and random padding, which pads zeros around the input image in a random manner.
Adversarial Attacks on Neural Network Policies
TLDR
This work shows existing adversarial example crafting techniques can be used to significantly degrade test-time performance of trained policies, even with small adversarial perturbations that do not interfere with human perception.
Explaining and Harnessing Adversarial Examples
TLDR
It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets.
Adversarial Examples: Attacks and Defenses for Deep Learning
TLDR
The methods for generating adversarial examples for DNNs are summarized, a taxonomy of these methods is proposed, and three major challenges in adversarialExamples are discussed and the potential solutions are discussed.
PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples
Adversarial perturbations of normal images are usually imperceptible to humans, but they can seriously confuse state-of-the-art machine learning models. What makes them so special in the eyes of
Adversarial Risk and the Dangers of Evaluating Against Weak Attacks
TLDR
This paper motivates the use of adversarial risk as an objective, although it cannot easily be computed exactly, and frames commonly used attacks and evaluation metrics as defining a tractable surrogate objective to the true adversarialrisk.
...
...