Corpus ID: 19167025

Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients

@inproceedings{Ross2018ImprovingTA,
  title={Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients},
  author={Andrew Slavin Ross and Finale Doshi-Velez},
  booktitle={AAAI},
  year={2018}
}
Deep neural networks have proven remarkably effective at solving many classification problems, but have been criticized recently for two major weaknesses: the reasons behind their predictions are uninterpretable, and the predictions themselves can often be fooled by small adversarial perturbations. [...] Key Result Finally, we demonstrate that regularizing input gradients makes them more naturally interpretable as rationales for model predictions.Expand
Towards Robust Training of Neural Networks by Regularizing Adversarial Gradients
Deep Defense: Training DNNs with Improved Adversarial Robustness
Jacobian Adversarially Regularized Networks for Robustness
Towards Improving Robustness of Deep Neural Networks to Adversarial Perturbations
Understanding and Enhancing the Transferability of Adversarial Examples
Adversarial Feature Desensitization
Robustness from Simple Classifiers
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 34 REFERENCES
Explaining and Harnessing Adversarial Examples
The Limitations of Deep Learning in Adversarial Settings
Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks
Ensemble Adversarial Training: Attacks and Defenses
Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks
The Space of Transferable Adversarial Examples
Adversarial Machine Learning at Scale
Biologically inspired protection of deep networks from adversarial attacks
Synthesizing Robust Adversarial Examples
Improved Training of Wasserstein GANs
...
1
2
3
4
...