• Corpus ID: 245144648

Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases

@article{Prabhumoye2021FewshotIP,
  title={Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases},
  author={Shrimai Prabhumoye and Rafal Kocielnik and Mohammad Shoeybi and Anima Anandkumar and Bryan Catanzaro},
  journal={ArXiv},
  year={2021},
  volume={abs/2112.07868}
}
Warning: this paper contains content that may be offensive or upsetting. Detecting social bias in text is challenging due to nuance, subjectivity, and difficulty in ob-taining good quality labeled datasets at scale, especially given the evolving nature of social biases and society. To address these challenges, we propose a few-shot instruction-based method for prompting pre-trained language models (LMs). We select a few class-balanced exemplars from a small support repository that are closest to… 

Can You Label Less by Using Out-of-Domain Data? Active & Transfer Learning with Few-shot Instructions

It is shown that annotation of just a few target- domain samples via active learning can be beneficial for transfer, but the impact diminishes with more annotation effort, and that not all transfer scenarios yield a positive gain, which seems related to the PLMs initial performance on the target-domain task.

Prompt-and-Rerank: A Method for Zero-Shot and Few-Shot Arbitrary Textual Style Transfer with Small Language Models

This work proposes a method for arbitrary textual style transfer (TST), based on a mathematical formulation of the TST task, that enables small pre-trained language models to perform on par with state-of-the-art large-scale models while using two orders of magnitude less compute and memory.

Toxicity Detection with Generative Prompt-based Inference

This work explores the generative variant of zero-shot prompt-based toxicity detection with comprehensive trials on prompt engineering and highlights the strengths of its generative classification approach both quantitatively and qualitatively.

COLD: A Benchmark for Chinese Offensive Language Detection

The factors that influence the offensive generations are investigated, and it is found that anti-bias contents and keywords referring to certain groups or revealing negative attitudes trigger offensive outputs easier.

Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models

The results indicate that the best performing strategy (INST) substantially reduces the toxicity probability up to 61% while preserving the accuracy on five benchmark NLP tasks as well as improving AUC scores on four bias detection tasks by 1.3%.

The Tail Wagging the Dog: Dataset Construction Biases of Social Bias Benchmarks

How reliably can we trust the scores obtained from social bias benchmarks as faithful indi-cators of problematic social biases in a given model? In this work, we study this question by contrasting

Predictability and Surprise in Large Generative Models

This paper highlights a counterintuitive property of large-scale generative models, which have a paradoxical combination of predictable loss on a broad training distribution, and unpredictable specific capabilities, inputs, and outputs, and analyzed how these conflicting properties combine to give model developers various motivations for deploying these models, and challenges that can hinder deployment.

References

SHOWING 1-10 OF 65 REFERENCES

Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP

This paper demonstrates a surprising finding: Pretrained language models recognize, to a considerable degree, their undesirable biases and the toxicity of the content they produce and proposes a decoding algorithm that reduces the probability of a language model producing problematic text, known as self-debiasing.

Making Pre-trained Language Models Better Few-shot Learners

The LM-BFF approach makes minimal assumptions on task resources and domain expertise, and hence constitutes a strong task-agnostic method for few-shot learning.

SOLID: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification

This work creates the largest available dataset for this task, SOLID, which contains over nine million English tweets labeled in a semi-supervised manner, and demonstrates experimentally that using SOLID along with OLID yields improved performance on the OLID test set for two different models, especially for the lower levels of the taxonomy.

Language Models are Few-Shot Learners

GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.

Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models

This work shows that finetuning LMs in the few-shot setting can considerably reduce the need for prompt engineering, and recommends finetuned LMs for few- shot learning as it is more accurate, robust to different prompts, and can be made nearly as efficient as using frozen LMs.

Deeper Attention to Abusive User Content Moderation

A novel, deep, classificationspecific attention mechanism improves the performance of the RNN further, and can also highlight suspicious words for free, without including highlighted words in the training data.

Factual Probing Is [MASK]: Learning vs. Learning to Recall

OptiPrompt is proposed, a novel and efficient method which directly optimizes in continuous embedding space and is able to predict an additional 6.4% of facts in the LAMA benchmark.

Language Models are Unsupervised Multitask Learners

It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

Abusive Language Detection in Online User Content

A machine learning based method to detect hate speech on online user comments from two domains which outperforms a state-of-the-art deep learning approach and a corpus of user comments annotated for abusive language, the first of its kind.

Adversarial NLI: A New Benchmark for Natural Language Understanding

This work introduces a new large-scale NLI benchmark dataset, collected via an iterative, adversarial human-and-model-in-the-loop procedure, and shows that non-expert annotators are successful at finding their weaknesses.
...