# Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients

@inproceedings{Ross2018ImprovingTA, title={Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients}, author={Andrew Slavin Ross and Finale Doshi-Velez}, booktitle={AAAI}, year={2018} }

Deep neural networks have proven remarkably effective at solving many classification problems, but have been criticized recently for two major weaknesses: the reasons behind their predictions are uninterpretable, and the predictions themselves can often be fooled by small adversarial perturbations. [...] Key Result Finally, we demonstrate that regularizing input gradients makes them more naturally interpretable as rationales for model predictions. Expand

#### Supplemental Code

Github Repo

Via Papers with Code

Code for AAAI 2018 accepted paper: "Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients"

#### Figures, Tables, and Topics from this paper

#### 277 Citations

Towards Understanding and Improving the Transferability of Adversarial Examples in Deep Neural Networks

- Computer Science
- ACML
- 2020

Towards Robust Training of Neural Networks by Regularizing Adversarial Gradients

- Mathematics, Computer Science
- ArXiv
- 2018

Improving Adversarial Robustness Requires Revisiting Misclassified Examples

- Computer Science
- ICLR
- 2020

Towards Improving Robustness of Deep Neural Networks to Adversarial Perturbations

- Computer Science
- IEEE Transactions on Multimedia
- 2020

Understanding and Enhancing the Transferability of Adversarial Examples

- Computer Science, Mathematics
- ArXiv
- 2018

Second Order Optimization for Adversarial Robustness and Interpretability

- Computer Science, Mathematics
- ArXiv
- 2020

#### References

SHOWING 1-10 OF 34 REFERENCES

Explaining and Harnessing Adversarial Examples

- Computer Science, Mathematics
- ICLR
- 2015

The Limitations of Deep Learning in Adversarial Settings

- Computer Science, Mathematics
- 2016 IEEE European Symposium on Security and Privacy (EuroS&P)
- 2016

Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks

- Computer Science
- NDSS
- 2018

Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks

- Computer Science, Mathematics
- 2016 IEEE Symposium on Security and Privacy (SP)
- 2016

Adversarial Machine Learning at Scale

- Computer Science, Mathematics
- ICLR
- 2017

Biologically inspired protection of deep networks from adversarial attacks

- Computer Science, Mathematics
- ArXiv
- 2017