RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models

@inproceedings{Gehman2020RealToxicityPromptsEN,
  title={RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models},
  author={Samuel Gehman and Suchin Gururangan and Maarten Sap and Yejin Choi and Noah A. Smith},
  booktitle={EMNLP},
  year={2020}
}
Pretrained neural language models (LMs) are prone to generating racist, sexist, or otherwise toxic language which hinders their safe deployment. We investigate the extent to which pretrained LMs can be prompted to generate toxic language, and the effectiveness of controllable text generation algorithms at preventing such toxic degeneration. We create and release RealToxicityPrompts, a dataset of 100K naturally occurring, sentence-level prompts derived from a large corpus of English web text… Expand
15 Citations
Challenges in Automated Debiasing for Toxic Language Detection
  • 2
  • PDF
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
  • Highly Influenced
  • PDF
Language Models have a Moral Dimension
  • Highly Influenced
  • PDF
Generating (Formulaic) Text by Splicing Together Nearest Neighbors
  • PDF
Recipes for Safety in Open-domain Chatbots
  • 4
  • Highly Influenced
  • PDF
Which *BERT? A Survey Organizing Contextualized Encoders
  • 6
  • PDF
Towards Ethics by Design in Online Abusive Content Detection
  • 1
  • PDF
...
1
2
...

References

SHOWING 1-10 OF 89 REFERENCES
The Curious Case of Neural Text Degeneration
  • 357
  • PDF
Universal Adversarial Triggers for Attacking and Analyzing NLP
  • 130
  • Highly Influential
  • PDF
The Radicalization Risks of GPT-3 and Advanced Neural Language Models
  • 6
  • PDF
Defending Against Neural Fake News
  • 207
  • PDF
Neural Text Generation with Unlikelihood Training
  • 79
  • PDF
The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks
  • 162
  • PDF
PowerTransformer: Unsupervised Controllable Revision for Biased Language Correction
  • 4
  • PDF
Language Models are Unsupervised Multitask Learners
  • 2,652
  • Highly Influential
  • PDF
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
  • 1,008
  • PDF
Plug and Play Language Models: A Simple Approach to Controlled Text Generation
  • 110
  • Highly Influential
  • PDF
...
1
2
3
4
5
...