• Corpus ID: 3310672

Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples

@inproceedings{Athalye2018ObfuscatedGG,
  title={Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples},
  author={Anish Athalye and Nicholas Carlini and David A. Wagner},
  booktitle={ICML},
  year={2018}
}
We identify obfuscated gradients, a kind of gradient masking, as a phenomenon that leads to a false sense of security in defenses against adversarial examples. While defenses that cause obfuscated gradients appear to defeat iterative optimization-based attacks, we find defenses relying on this effect can be circumvented. We describe characteristic behaviors of defenses exhibiting the effect, and for each of the three types of obfuscated gradients we discover, we develop attack techniques to… 

Figures and Tables from this paper

Barrage of Random Transforms for Adversarially Robust Defense
TLDR
It is shown that, even after accounting for obfuscated gradients, the Barrage of Random Transforms (BaRT) is a resilient defense against even the most difficult attacks, such as PGD.
Mitigating Advanced Adversarial Attacks with More Advanced Gradient Obfuscation Techniques
TLDR
This paper performs an in-depth analysis about the root causes of advanced gradient-based attack techniques, and proposes four properties that can break the fundamental assumptions of those attacks, and identifies a set of operations that can meet those properties.
Measuring the False Sense of Security
TLDR
This work investigates gradient masking under the lens of its mensurability, departing from the idea that it is a binary phenomenon, and proposes and motivate several metrics for it, performing extensive empirical tests on defenses suspected of exhibiting different degrees of gradientmasking.
Stateful Detection of Black-Box Adversarial Attacks
TLDR
This paper develops a defense designed to detect the process of generating adversarial examples, and introduces query blinding, a new class of attacks designed to bypass defenses that rely on such a defense approach.
Gradients Cannot Be Tamed: Behind the Impossible Paradox of Blocking Targeted Adversarial Attacks
  • Ziv Katzir, Y. Elovici
  • Computer Science
    IEEE Transactions on Neural Networks and Learning Systems
  • 2021
TLDR
It is proved that defensive distillation is highly effective against nontargeted attacks but is unsuitable for targeted attacks, implying that blocking them comes at the cost of losing the network’s ability to learn, presenting an impossible tradeoff to the research community.
Automated Discovery of Adaptive Attacks on Adversarial Defenses
TLDR
This work presents an extensible framework that defines a search space over a set of reusable building blocks and automatically discovers an effective attack on a given model with an unknown defense by searching over suitable combinations of these blocks.
Mockingbird: Defending Against Deep-Learning-Based Website Fingerprinting Attacks With Adversarial Traces
TLDR
A novel defense, Mockingbird, is explored, a technique for generating traces that resists adversarial training by moving randomly in the space of viable traces and not following more predictable gradients, while incurring lower bandwidth overheads.
Indicators of Attack Failure: Debugging and Improving Optimization of Adversarial Examples
TLDR
This work defines a set of quantitative indicators which unveil common failures in the optimization of gradient-based attacks, and proposes specific mitigation strategies within a systematic evaluation protocol, providing a first concrete step towards automatizing and systematizing current adversarial robustness evaluations.
On Evaluating Adversarial Robustness
TLDR
The methodological foundations are discussed, commonly accepted best practices are reviewed, and new methods for evaluating defenses to adversarial examples are suggested.
Stochastic Substitute Training: A Gray-box Approach to Craft Adversarial Examples Against Gradient Obfuscation Defenses
TLDR
Stochastic substitute training is introduced, a gray-box approach that can craft adversarial examples for defenses which obfuscate gradients, and is demonstrated by applying it against two defenses which have tried to make models more robust.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 44 REFERENCES
Evasion Attacks against Machine Learning at Test Time
TLDR
This work presents a simple but effective gradient-based approach that can be exploited to systematically assess the security of several, widely-used classification algorithms against evasion attacks.
Efficient Defenses Against Adversarial Attacks
TLDR
This work proposes a new defense method based on practical observations which is easy to integrate into models and performs better than state-of-the-art defenses against adversarial attacks against deep neural networks.
Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks
TLDR
The study shows that defensive distillation can reduce effectiveness of sample creation from 95% to less than 0.5% on a studied DNN, and analytically investigates the generalizability and robustness properties granted by the use of defensive Distillation when training DNNs.
Towards Deep Learning Models Resistant to Adversarial Attacks
TLDR
This work studies the adversarial robustness of neural networks through the lens of robust optimization, and suggests the notion of security against a first-order adversary as a natural and broad security guarantee.
Countering Adversarial Images using Input Transformations
TLDR
This paper investigates strategies that defend against adversarial-example attacks on image-classification systems by transforming the inputs before feeding them to the system, and shows that total variance minimization and image quilting are very effective defenses in practice, when the network is trained on transformed images.
PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples
Adversarial perturbations of normal images are usually imperceptible to humans, but they can seriously confuse state-of-the-art machine learning models. What makes them so special in the eyes of
Thermometer Encoding: One Hot Way To Resist Adversarial Examples
TLDR
A simple modification to standard neural network ar3 chitectures, thermometer encoding is proposed, which significantly increases the robustness of the network to adversarial examples, and the proper ties of these networks are explored, providing evidence that thermometer encodings help neural networks to find more-non-linear decision boundaries.
Certified Defenses against Adversarial Examples
TLDR
This work proposes a method based on a semidefinite relaxation that outputs a certificate that for a given network and test input, no attack can force the error to exceed a certain value, providing an adaptive regularizer that encourages robustness against all attacks.
Adversarial Example Defense: Ensembles of Weak Defenses are not Strong
TLDR
It is shown that an adaptive adversary can create adversarial examples successfully with low distortion, implying that ensemble of weak defenses is not sufficient to provide strong defense against adversarialExamples.
Practical Black-Box Attacks against Machine Learning
TLDR
This work introduces the first practical demonstration of an attacker controlling a remotely hosted DNN with no such knowledge, and finds that this black-box attack strategy is capable of evading defense strategies previously found to make adversarial example crafting harder.
...
1
2
3
4
5
...