• Corpus ID: 24029589

Adversarial Spheres

  title={Adversarial Spheres},
  author={Justin Gilmer and Luke Metz and Fartash Faghri and Samuel S. Schoenholz and Maithra Raghu and Martin Wattenberg and Ian J. Goodfellow},
State of the art computer vision models have been shown to be vulnerable to small adversarial perturbations of the input. In other words, most images in the data distribution are both correctly classified by the model and are very close to a visually similar misclassified image. Despite substantial research interest, the cause of the phenomenon is still poorly understood and remains unsolved. We hypothesize that this counter intuitive behavior is a naturally occurring result of the high… 

Figures from this paper

Principal Component Adversarial Example

This paper proposes a new concept, called the adversarial region, which explains the existence of adversarial examples as perturbations perpendicular to the tangent plane of the data manifold, and proposes a novel target-free method to generate adversarialExamples via principal component analysis.

Spatially Correlated Patterns in Adversarial Images

The theoretical setup for formalising the process of segregation, isolation and neutralization of regions within an input image which are particularly critical towards either classification (during inference), or adversarial vulnerability or both is established.

Robustness via Curvature Regularization, and Vice Versa

It is shown in particular that adversarial training leads to a significant decrease in the curvature of the loss surface with respect to inputs, leading to a drastically more "linear" behaviour of the network.

Adversarial Robustness Through Local Lipschitzness

The results show that having a small Lipschitz constant correlates with achieving high clean and robust accuracy, and therefore, the smoothness of the classifier is an important property to consider in the context of adversarial examples.

Examining the Proximity of Adversarial Examples to Class Manifolds in Deep Networks

Light is shed on inner representations of the AEs by analysing their activations on the hidden layers by proposing two methods that can be used to compare the distances to class-specific manifolds, regardless of the changing dimensions throughout the network.

Robustness to adversarial examples can be improved with overfitting

It is argued that the error in adversarial examples is caused by high bias, i.e. by regularization that has local negative effects, which ties the phenomenon to the trade-off that exists in machine learning between fitting and generalization.

Adversarial Robustness via Fisher-Rao Regularization

This work proposes an information-geometric formulation of adversarial defense and introduces Fire, a new Fisher-Rao regularization for the categorical cross-entropy loss, which is based on the geodesic distance between the softmax outputs corresponding to natural and perturbed input features.

Adversarial Examples on Object Recognition

The hypotheses behind their existence, the methods used to construct or protect against them, and the capacity to transfer adversarial examples between different machine learning models are introduced.

Adversarially Robust Generalization Requires More Data

It is shown that already in a simple natural data model, the sample complexity of robust learning can be significantly larger than that of "standard" learning.

A Hamiltonian Monte Carlo Method for Probabilistic Adversarial Attack and Learning

To improve the efficiency of HMC, a new regime to automatically control the length of trajectories is proposed, which allows the algorithm to move with adaptive step sizes along the search direction at different positions and revisit the reason for high computational cost of adversarial training under the view of MCMC and design a new generative method called Contrastive Adversarial Training (CAT).



Adversarially Robust Generalization Requires More Data

It is shown that already in a simple natural data model, the sample complexity of robust learning can be significantly larger than that of "standard" learning.

Detecting Adversarial Samples from Artifacts

This paper investigates model confidence on adversarial samples by looking at Bayesian uncertainty estimates, available in dropout neural networks, and by performing density estimation in the subspace of deep features learned by the model, and results show a method for implicit adversarial detection that is oblivious to the attack algorithm.

Dense Associative Memory Is Robust to Adversarial Inputs

DAMs with higher-order energy functions are more robust to adversarial and rubbish inputs than DNNs with rectified linear units and open up the possibility of using higher- order models for detecting and stopping malicious adversarial attacks.

PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples

Adversarial perturbations of normal images are usually imperceptible to humans, but they can seriously confuse state-of-the-art machine learning models. What makes them so special in the eyes of

Adversarial vulnerability for any classifier

This paper derives fundamental upper bounds on the robustness to perturbation of any classification function, and proves the existence of adversarial perturbations that transfer well across different classifiers with small risk.

On Detecting Adversarial Perturbations

It is shown empirically that adversarial perturbations can be detected surprisingly well even though they are quasi-imperceptible to humans.

Foveation-based Mechanisms Alleviate Adversarial Examples

It is shown that adversarial examples, i.e., the visually imperceptible perturbations that result in Convolutional Neural Networks (CNNs) fail, can be alleviated with a mechanism based on foveations---applying the CNN in different image regions, and corroborate that when the neural responses are linear, applying the foveation mechanism to the adversarial example tends to significantly reduce the effect of the perturbation.

Measuring the tendency of CNNs to Learn Surface Statistical Regularities

Deep CNNs are known to exhibit the following peculiarity: on the one hand they generalize extremely well to a test set, while on the other hand they are extremely sensitive to so-called adversarial

Explaining and Harnessing Adversarial Examples

It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets.

Adversarial examples in the physical world

It is found that a large fraction of adversarial examples are classified incorrectly even when perceived through the camera, which shows that even in physical world scenarios, machine learning systems are vulnerable to adversarialExamples.