RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
@inproceedings{Gehman2020RealToxicityPromptsEN, title={RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models}, author={Samuel Gehman and Suchin Gururangan and Maarten Sap and Yejin Choi and Noah A. Smith}, booktitle={EMNLP}, year={2020} }
Pretrained neural language models (LMs) are prone to generating racist, sexist, or otherwise toxic language which hinders their safe deployment. We investigate the extent to which pretrained LMs can be prompted to generate toxic language, and the effectiveness of controllable text generation algorithms at preventing such toxic degeneration. We create and release RealToxicityPrompts, a dataset of 100K naturally occurring, sentence-level prompts derived from a large corpus of English web text… CONTINUE READING
Figures and Tables from this paper
10 Citations
Confronting Abusive Language Online: A Survey from the Ethical and Human Rights Perspective
- Computer Science
- ArXiv
- 2020
- PDF
Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense Graphs
- Computer Science
- EMNLP
- 2020
- 1
- PDF
References
SHOWING 1-10 OF 89 REFERENCES
Universal Adversarial Triggers for Attacking and Analyzing NLP
- Computer Science
- EMNLP/IJCNLP
- 2019
- 120
- Highly Influential
- PDF
The Radicalization Risks of GPT-3 and Advanced Neural Language Models
- Computer Science, Political Science
- ArXiv
- 2020
- 4
- PDF
The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks
- Computer Science
- USENIX Security Symposium
- 2019
- 152
- PDF
PowerTransformer: Unsupervised Controllable Revision for Biased Language Correction
- Computer Science
- EMNLP
- 2020
- 2
- PDF
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- Computer Science, Mathematics
- J. Mach. Learn. Res.
- 2020
- 896
- PDF
Plug and Play Language Models: A Simple Approach to Controlled Text Generation
- Computer Science
- ICLR
- 2020
- 94
- Highly Influential
- PDF