• Corpus ID: 245144648

Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases

  title={Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases},
  author={Shrimai Prabhumoye and Rafal Kocielnik and Mohammad Shoeybi and Anima Anandkumar and Bryan Catanzaro},
Warning: this paper contains content that may be offensive or upsetting. Detecting social bias in text is challenging due to nuance, subjectivity, and difficulty in ob-taining good quality labeled datasets at scale, especially given the evolving nature of social biases and society. To address these challenges, we propose a few-shot instruction-based method for prompting pre-trained language models (LMs). We select a few class-balanced exemplars from a small support repository that are closest to… 
COLD: A Benchmark for Chinese Offensive Language Detection
The proposed COLDETECTOR is used to help detoxify the Chinese online communities and evaluate the safety performance of generative language models, and it is found that CPM tends to generate more offensive output than CDialGPT, and specific prompts can trigger offensiveness outputs more easily.
Predictability and Surprise in Large Generative Models
This paper highlights a counterintuitive property of large-scale generative models, which has an unusual combination of predictable loss on a broad training distribution, and unpredictable specific capabilities, inputs, and outputs, and analyzes how these conflicting properties combine to give model developers various motivations for deploying these models, and challenges that can hinder deployment.


Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
This paper demonstrates a surprising finding: Pretrained language models recognize, to a considerable degree, their undesirable biases and the toxicity of the content they produce and proposes a decoding algorithm that reduces the probability of a language model producing problematic text, known as self-debiasing.
SOLID: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification
This work creates the largest available dataset for this task, SOLID, which contains over nine million English tweets labeled in a semi-supervised manner, and demonstrates experimentally that using SOLID along with OLID yields improved performance on the OLID test set for two different models, especially for the lower levels of the taxonomy.
Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models
This work shows that finetuning LMs in the few-shot setting can considerably reduce the need for prompt engineering, and recommends them for few- shot learning as it is more accurate, robust to different prompts, and can be made nearly as efficient as using frozen LMs.
Deeper Attention to Abusive User Content Moderation
A novel, deep, classificationspecific attention mechanism improves the performance of the RNN further, and can also highlight suspicious words for free, without including highlighted words in the training data.
Language Models are Unsupervised Multitask Learners
It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.
Abusive Language Detection in Online User Content
A machine learning based method to detect hate speech on online user comments from two domains which outperforms a state-of-the-art deep learning approach and a corpus of user comments annotated for abusive language, the first of its kind.
Adversarial NLI: A New Benchmark for Natural Language Understanding
This work introduces a new large-scale NLI benchmark dataset, collected via an iterative, adversarial human-and-model-in-the-loop procedure, and shows that non-expert annotators are successful at finding their weaknesses.
CONAN - COunter NArratives through Nichesourcing: a Multilingual Dataset of Responses to Fight Online Hate Speech
This paper describes the creation of the first large-scale, multilingual, expert-based dataset of hate-speech/counter-narrative pairs, built with the effort of more than 100 operators from three different NGOs that applied their training and expertise to the task.
Hate Speech Detection with Comment Embeddings
This work proposes to learn distributed low-dimensional representations of comments using recently proposed neural language models, that can then be fed as inputs to a classification algorithm, resulting in highly efficient and effective hate speech detectors.
Learning Representations for Detecting Abusive Language
The approach is inspired by recent advances in transfer learning and word embeddings, and it is shown that learned representations do contain useful information that can be used to improve detection performance when training data is limited.