Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases
@article{Prabhumoye2021FewshotIP, title={Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases}, author={Shrimai Prabhumoye and Rafal Kocielnik and Mohammad Shoeybi and Anima Anandkumar and Bryan Catanzaro}, journal={ArXiv}, year={2021}, volume={abs/2112.07868} }
Warning: this paper contains content that may be offensive or upsetting. Detecting social bias in text is challenging due to nuance, subjectivity, and difficulty in ob-taining good quality labeled datasets at scale, especially given the evolving nature of social biases and society. To address these challenges, we propose a few-shot instruction-based method for prompting pre-trained language models (LMs). We select a few class-balanced exemplars from a small support repository that are closest to…
Figures and Tables from this paper
7 Citations
Can You Label Less by Using Out-of-Domain Data? Active & Transfer Learning with Few-shot Instructions
- Computer ScienceArXiv
- 2022
It is shown that annotation of just a few target- domain samples via active learning can be beneficial for transfer, but the impact diminishes with more annotation effort, and that not all transfer scenarios yield a positive gain, which seems related to the PLMs initial performance on the target-domain task.
Prompt-and-Rerank: A Method for Zero-Shot and Few-Shot Arbitrary Textual Style Transfer with Small Language Models
- Computer ScienceEMNLP
- 2022
This work proposes a method for arbitrary textual style transfer (TST), based on a mathematical formulation of the TST task, that enables small pre-trained language models to perform on par with state-of-the-art large-scale models while using two orders of magnitude less compute and memory.
Toxicity Detection with Generative Prompt-based Inference
- Computer ScienceArXiv
- 2022
This work explores the generative variant of zero-shot prompt-based toxicity detection with comprehensive trials on prompt engineering and highlights the strengths of its generative classification approach both quantitatively and qualitatively.
COLD: A Benchmark for Chinese Offensive Language Detection
- Computer ScienceEMNLP
- 2022
The factors that influence the offensive generations are investigated, and it is found that anti-bias contents and keywords referring to certain groups or revealing negative attitudes trigger offensive outputs easier.
Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models
- Computer ScienceArXiv
- 2023
The results indicate that the best performing strategy (INST) substantially reduces the toxicity probability up to 61% while preserving the accuracy on five benchmark NLP tasks as well as improving AUC scores on four bias detection tasks by 1.3%.
The Tail Wagging the Dog: Dataset Construction Biases of Social Bias Benchmarks
- EconomicsArXiv
- 2022
How reliably can we trust the scores obtained from social bias benchmarks as faithful indi-cators of problematic social biases in a given model? In this work, we study this question by contrasting…
Predictability and Surprise in Large Generative Models
- Computer ScienceFAccT
- 2022
This paper highlights a counterintuitive property of large-scale generative models, which have a paradoxical combination of predictable loss on a broad training distribution, and unpredictable specific capabilities, inputs, and outputs, and analyzed how these conflicting properties combine to give model developers various motivations for deploying these models, and challenges that can hinder deployment.
References
SHOWING 1-10 OF 65 REFERENCES
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
- Computer ScienceTransactions of the Association for Computational Linguistics
- 2021
This paper demonstrates a surprising finding: Pretrained language models recognize, to a considerable degree, their undesirable biases and the toxicity of the content they produce and proposes a decoding algorithm that reduces the probability of a language model producing problematic text, known as self-debiasing.
Making Pre-trained Language Models Better Few-shot Learners
- Computer ScienceACL
- 2021
The LM-BFF approach makes minimal assumptions on task resources and domain expertise, and hence constitutes a strong task-agnostic method for few-shot learning.
SOLID: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification
- Computer ScienceFINDINGS
- 2021
This work creates the largest available dataset for this task, SOLID, which contains over nine million English tweets labeled in a semi-supervised manner, and demonstrates experimentally that using SOLID along with OLID yields improved performance on the OLID test set for two different models, especially for the lower levels of the taxonomy.
Language Models are Few-Shot Learners
- Computer ScienceNeurIPS
- 2020
GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.
Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models
- Computer ScienceFINDINGS
- 2022
This work shows that finetuning LMs in the few-shot setting can considerably reduce the need for prompt engineering, and recommends finetuned LMs for few- shot learning as it is more accurate, robust to different prompts, and can be made nearly as efficient as using frozen LMs.
Deeper Attention to Abusive User Content Moderation
- Computer ScienceEMNLP
- 2017
A novel, deep, classificationspecific attention mechanism improves the performance of the RNN further, and can also highlight suspicious words for free, without including highlighted words in the training data.
Factual Probing Is [MASK]: Learning vs. Learning to Recall
- Computer ScienceNAACL
- 2021
OptiPrompt is proposed, a novel and efficient method which directly optimizes in continuous embedding space and is able to predict an additional 6.4% of facts in the LAMA benchmark.
Language Models are Unsupervised Multitask Learners
- Computer Science
- 2019
It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.
Abusive Language Detection in Online User Content
- Computer ScienceWWW
- 2016
A machine learning based method to detect hate speech on online user comments from two domains which outperforms a state-of-the-art deep learning approach and a corpus of user comments annotated for abusive language, the first of its kind.
Adversarial NLI: A New Benchmark for Natural Language Understanding
- Computer ScienceACL
- 2020
This work introduces a new large-scale NLI benchmark dataset, collected via an iterative, adversarial human-and-model-in-the-loop procedure, and shows that non-expert annotators are successful at finding their weaknesses.