• Publications
  • Influence
Towards Deep Learning Models Resistant to Adversarial Attacks
TLDR
This work studies the adversarial robustness of neural networks through the lens of robust optimization, and suggests the notion of security against a first-order adversary as a natural and broad security guarantee. Expand
Robustness May Be at Odds with Accuracy
TLDR
It is shown that there may exist an inherent tension between the goal of adversarial robustness and that of standard generalization, and it is argued that this phenomenon is a consequence of robust classifiers learning fundamentally different feature representations than standard classifiers. Expand
Adversarial Examples Are Not Bugs, They Are Features
TLDR
It is demonstrated that adversarial examples can be directly attributed to the presence of non-robust features: features derived from patterns in the data distribution that are highly predictive, yet brittle and incomprehensible to humans. Expand
Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors
TLDR
A framework that conceptually unifies much of the existing work on black-box attacks is introduced, and it is demonstrated that the current state-of-the-art methods are optimal in a natural sense. Expand
Adversarially Robust Generalization Requires More Data
TLDR
It is shown that already in a simple natural data model, the sample complexity of robust learning can be significantly larger than that of "standard" learning. Expand
How Does Batch Normalization Help Optimization?
TLDR
It is demonstrated that such distributional stability of layer inputs has little to do with the success of BatchNorm, and this smoothness induces a more predictive and stable behavior of the gradients, allowing for faster training. Expand
Spectral Signatures in Backdoor Attacks
TLDR
Spectral signatures are identified as a new property of all known backdoor attacks, which allows tools from robust statistics to thwart the attacks and is demonstrated the efficacy of these signatures in detecting and removing poisoned examples on real image sets and state of the art neural network architectures. Expand
On Evaluating Adversarial Robustness
TLDR
The methodological foundations are discussed, commonly accepted best practices are reviewed, and new methods for evaluating defenses to adversarial examples are suggested. Expand
A Rotation and a Translation Suffice: Fooling CNNs with Simple Transformations
TLDR
It is shown that neural networks are already vulnerable to significantly simpler - and more likely to occur naturally - transformations of the inputs, and that the current neural network-based vision models might not be as reliable as the authors tend to assume. Expand
On Adaptive Attacks to Adversarial Example Defenses
TLDR
It is demonstrated that thirteen defenses recently published at ICLR, ICML and NeurIPS---and chosen for illustrative and pedagogical purposes---can be circumvented despite attempting to perform evaluations using adaptive attacks. Expand
...
1
2
3
4
5
...