White-to-Black: Efficient Distillation of Black-Box Adversarial Attacks

@inproceedings{Gil2019WhitetoBlackED,
  title={White-to-Black: Efficient Distillation of Black-Box Adversarial Attacks},
  author={Yotam Gil and Yoav Chai and O. A. Gorodissky and Jonathan Berant},
  booktitle={NAACL-HLT},
  year={2019}
}
Adversarial examples are important for understanding the behavior of neural models, and can improve their robustness through adversarial training. Recent work in natural language processing generated adversarial examples by assuming white-box access to the attacked model, and optimizing the input directly against it (Ebrahimi et al., 2018). In this work, we show that the knowledge implicit in the optimization procedure can be distilled into another more efficient neural network. We train a… CONTINUE READING

References

Publications referenced by this paper.
SHOWING 1-10 OF 19 REFERENCES

HotFlip: White-Box Adversarial Examples for NLP

VIEW 6 EXCERPTS
HIGHLY INFLUENTIAL

Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers

VIEW 2 EXCERPTS