• Computer Science
  • Published in ArXiv 2019

Universal Adversarial Triggers for NLP

@article{Wallace2019UniversalAT,
  title={Universal Adversarial Triggers for NLP},
  author={Eric Wallace and Shi Feng and Nikhil Kandpal and Matthew Gardner and Sameer Singh},
  journal={ArXiv},
  year={2019},
  volume={abs/1908.07125}
}
Adversarial examples highlight model vulnerabilities and are useful for evaluation and interpretation. We define universal adversarial triggers: input-agnostic sequences of tokens that trigger a model to produce a specific prediction when concatenated to any input from a dataset. We propose a gradient-guided search over tokens which finds short trigger sequences (e.g., one word for classification and four words for language modeling) that successfully trigger the target prediction. For example… CONTINUE READING

References

Publications referenced by this paper.
SHOWING 1-10 OF 37 REFERENCES

Language Models are Unsupervised Multitask Learners

VIEW 16 EXCERPTS
HIGHLY INFLUENTIAL

Annotation Artifacts in Natural Language Inference Data

VIEW 4 EXCERPTS
HIGHLY INFLUENTIAL

Universal Adversarial Perturbations

VIEW 3 EXCERPTS
HIGHLY INFLUENTIAL