White-to-Black: Efficient Distillation of Black-Box Adversarial Attacks

@article{Gil2019WhitetoBlackED,
  title={White-to-Black: Efficient Distillation of Black-Box Adversarial Attacks},
  author={Yotam Gil and Yoav Chai and O. A. Gorodissky and Jonathan Berant},
  journal={CoRR},
  year={2019},
  volume={abs/1904.02405}
}
Adversarial examples are important for understanding the behavior of neural models, and can improve their robustness through adversarial training. Recent work in natural language processing generated adversarial examples by assuming white-box access to the attacked model, and optimizing the input directly against it (Ebrahimi et al., 2018). In this work, we show that the knowledge implicit in the optimization procedure can be distilled into another more efficient neural network. We train a… CONTINUE READING

References

Publications referenced by this paper.
SHOWING 1-10 OF 19 REFERENCES

Hotflip : Whitebox adversarial examples for text classification

  • Eric Wallace, Alvin Grissom, Mohit Iyyer, Pedro Rodriguez
  • Proceedings of the 56 th Annual Meeting of the…
  • 2018
1 Excerpt

Similar Papers

Loading similar papers…