Adversarially trained neural representations may already be as robust as corresponding biological neural representations

@article{Guo2022AdversariallyTN,
  title={Adversarially trained neural representations may already be as robust as corresponding biological neural representations},
  author={Chong Guo and Michael J. Lee and Guillaume Leclerc and Joel Dapello and Yug Rao and Aleksander Madry and James J. DiCarlo},
  journal={ArXiv},
  year={2022},
  volume={abs/2206.11228}
}
Visual systems of primates are the gold standard of robust perception. There is thus a general belief that mimicking the neural representations that underlie those systems will yield artificial visual systems that are adversarially robust. In this work, we develop a method for performing adversarial visual attacks directly on primate brain activ-ity. We then leverage this method to demonstrate that the above-mentioned belief might not be well founded. Specifically, we report that the biological… 

Figures from this paper

Aligning Model and Macaque Inferior Temporal Cortex Representations Improves Model-to-Human Behavioral Alignment and Adversarial Robustness

The results demonstrate that building models that are more aligned with the primate brain leads to more robust and human-like behavior, and call for larger neural data-sets to further augment these gains.

References

SHOWING 1-10 OF 43 REFERENCES

Adversarial images for the primate brain

This work designed adversarial images to fool primate vision and modified images to match their model-predicted neuronal responses to a target category, such as monkey faces, showing that a model of neuronal activity can selectively direct primate visual behavior.

Adversarial Robustness as a Prior for Learned Representations

This work shows that robust optimization can be re-cast as a tool for enforcing priors on the features learned by deep neural networks, and indicates adversarial robustness as a promising avenue for improving learned representations.

Adversarial Examples that Fool both Computer Vision and Time-Limited Humans

It is found that adversarial examples that strongly transfer across computer vision models influence the classifications made by time-limited human observers.

Humans can decipher adversarial images

How humans can anticipate which objects CNNs will see in adversarial images is shown, showing that human and machine classification of adversarial image classification are robustly related.

Adversarially-Trained Deep Nets Transfer Better

It is demonstrated that adversarially- trained models transfer better across new domains than naturally-trained models, even though it's known that these models do not generalize as well as naturally-training models on the source domain.

Simulating a Primary Visual Cortex at the Front of CNNs Improves Robustness to Image Perturbations

While current CNN architectures are arguably brain-inspired, the results presented here demonstrate that more precisely mimicking just one stage of the primate visual system leads to new gains in ImageNet-level computer vision applications.

Explaining and Harnessing Adversarial Examples

It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets.

Adversarial Robustness through Local Linearization

A novel regularizer is introduced that encourages the loss to behave linearly in the vicinity of the training data, thereby penalizing gradient obfuscation while encouraging robustness and shows via extensive experiments on CIFAR-10 and ImageNet, that models trained with this regularizer avoid gradient obfuscations and can be trained significantly faster than adversarial training.

Do Adversarially Robust ImageNet Models Transfer Better?

It is found that adversarially robust models, while less accurate, often perform better than their standard-trained counterparts when used for transfer learning, and this work focuses on adversARially robust ImageNet classifiers.

Towards Deep Learning Models Resistant to Adversarial Attacks

This work studies the adversarial robustness of neural networks through the lens of robust optimization, and suggests the notion of security against a first-order adversary as a natural and broad security guarantee.