Share This Author
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
- Samuel Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, Noah A. Smith
- Computer ScienceFindings
- 24 September 2020
It is found that pretrained LMs can degenerate into toxic text even from seemingly innocuous prompts, and empirically assess several controllable generation methods find that while data- or compute-intensive methods are more effective at steering away from toxicity than simpler solutions, no current method is failsafe against neural toxic degeneration.