RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models

@inproceedings{Gehman2020RealToxicityPromptsEN,
  title={RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models},
  author={Samuel Gehman and Suchin Gururangan and Maarten Sap and Yejin Choi and Noah A. Smith},
  booktitle={EMNLP},
  year={2020}
}
  • Samuel Gehman, Suchin Gururangan, +2 authors Noah A. Smith
  • Published in EMNLP 2020
  • Computer Science
  • Pretrained neural language models (LMs) are prone to generating racist, sexist, or otherwise toxic language which hinders their safe deployment. We investigate the extent to which pretrained LMs can be prompted to generate toxic language, and the effectiveness of controllable text generation algorithms at preventing such toxic degeneration. We create and release RealToxicityPrompts, a dataset of 100K naturally occurring, sentence-level prompts derived from a large corpus of English web text… CONTINUE READING
    10 Citations
    Challenges in Automated Debiasing for Toxic Language Detection
    • PDF
    Intrinsic Bias Metrics Do Not Correlate with Application Bias
    • PDF
    Generating (Formulaic) Text by Splicing Together Nearest Neighbors
    • PDF
    Recipes for Safety in Open-domain Chatbots
    • 3
    • Highly Influenced
    • PDF
    Which *BERT? A Survey Organizing Contextualized Encoders
    • 5
    • PDF
    Towards Ethics by Design in Online Abusive Content Detection
    • 1
    • PDF
    Machine-Assisted Script Curation
    • PDF

    References

    SHOWING 1-10 OF 89 REFERENCES
    The Curious Case of Neural Text Degeneration
    • 324
    • PDF
    Universal Adversarial Triggers for Attacking and Analyzing NLP
    • 120
    • Highly Influential
    • PDF
    The Radicalization Risks of GPT-3 and Advanced Neural Language Models
    • 4
    • PDF
    Defending Against Neural Fake News
    • 194
    • PDF
    Neural Text Generation with Unlikelihood Training
    • 73
    • PDF
    The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks
    • 152
    • PDF
    PowerTransformer: Unsupervised Controllable Revision for Biased Language Correction
    • 2
    • PDF
    Language Models are Unsupervised Multitask Learners
    • 2,450
    • Highly Influential
    • PDF
    Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
    • 896
    • PDF
    Plug and Play Language Models: A Simple Approach to Controlled Text Generation
    • 94
    • Highly Influential
    • PDF