WARP: Word-level Adversarial ReProgramming

@inproceedings{Hambardzumyan2021WARPWA,
  title={WARP: Word-level Adversarial ReProgramming},
  author={Karen Hambardzumyan and H. Khachatrian and Jonathan May},
  booktitle={ACL},
  year={2021}
}
Transfer learning from pretrained language models recently became the dominant approach for solving many NLP tasks. A common approach to transfer learning for multiple tasks that maximize parameter sharing trains one or more task-specific layers on top of the language model. In this paper, we present an alternative approach based on adversarial reprogramming, which extends earlier work on automatic prompt generation. Adversarial reprogramming attempts to learn task-specific word embeddings that… 

Figures and Tables from this paper

Making Pre-trained Language Models End-to-end Few-shot Learners with Contrastive Prompt Tuning

TLDR
CP-Tuning is presented, the first end-to-end Contrastive Prompt Tuning framework for fine-tuning PLMs without any manual engineering of task-specific prompts and verbalizers, and it is integrated with the task-invariantcontinuous prompt encoding technique with fully trainable prompt parameters.

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

TLDR
A unified set of mathematical notations that can cover a wide variety of existing work, and organize existing work along several dimensions, e.g. the choice of pre-trained language models, prompts, and tuning strategies are described.

Meta-Learning the Difference: Preparing Large Language Models for Efficient Adaptation

TLDR
Experiments on few-shot dialogue completion, low-resource abstractive summarization, and multi-domain language modeling show improvements in adaptation time and performance over direct finetuning or preparation via domain-adaptive pretraining.

Contrastive Demonstration Tuning for Pre-trained Language Models

TLDR
Experimental results illustrate that the proposed novel pluggable, ex-tensible, and efficient approach named contrastive demonstration tuning, which is free of demonstration sampling, integrated with previous approaches LM-BFF and P-tuning can yield better performance.

Exploring Visual Prompts for Adapting Large-Scale Models

TLDR
The surprising effectiveness of visual prompting provides a new perspective on adapting pre-trained models in vision and is particularly effective for CLIP and robust to distribution shift, achieving performance competitive with standard linear probes.

Prototypical Verbalizer for Prompt-based Few-shot Tuning

TLDR
This work proposes the prototypical verbalizer (ProtoVerb) which is built directly from training data and demonstrates that ProtoVerb significantly outperforms current automatic verbalizers, especially when training data is extremely scarce.

Zero-shot Cross-lingual Transfer of Prompt-based Tuning with a Unified Multilingual Prompt

TLDR
A novel model that uses a unified prompt for all languages, called UniPrompt, which is model-based and languageagnostic, and can significantly outperform the strong baselines across different languages.

Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey

TLDR
A survey of recent work that uses large, pre-trained transformer-based language models to solve NLP tasks via pre-training then fine-tuning, prompting, or text generation approaches.

Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning

TLDR
An analysis framework is proposed that links the pretraining and downstream tasks with an underlying latent variable generative model of text — the downstream classifier must recover a function of the posterior distribution over the latent variables.

The Power of Scale for Parameter-Efficient Prompt Tuning

TLDR
This work explores “prompt tuning”, a simple yet effective mechanism for learning “soft prompts” to condition frozen language models to perform specific downstream tasks, and shows that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning.
...

References

SHOWING 1-10 OF 40 REFERENCES

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

TLDR
A Sentiment Treebank that includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality, and introduces the Recursive Neural Tensor Network.

AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts

  • Proceedings of the 2020 Conference on Empirical Meth-
  • 2020

It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners

TLDR
This work shows that performance similar to GPT-3 can be obtained with language models that are much “greener” in that their parameter count is several orders of magnitude smaller, and identifies key factors required for successful natural language understanding with small language models.

RoBERTa: A Robustly Optimized BERT Pretraining Approach

TLDR
It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.

Adversarial Reprogramming of Neural Networks

TLDR
This paper demonstrates adversarial reprogramming on six ImageNet classification models, repurposing these models to perform a counting task, as well as classification tasks: classification of MNIST and CIFAR-10 examples presented as inputs to the ImageNet model.

The Sixth PASCAL Recognizing Textual Entailment Challenge

TLDR
This paper presents the Sixth Recognizing Textual Entailment (RTE-6) challenge, as the traditional Main Task was replaced by a new task, similar to the RTE-5 Search Pilot, in which TextualEntailment is performed on a real corpus in the Update Summarization scenario.

AdapterFusion: Non-Destructive Task Composition for Transfer Learning

TLDR
This work proposes AdapterFusion, a new two stage learning algorithm that leverages knowledge from multiple tasks by separating the two stages, i.e., knowledge extraction and knowledge composition, so that the classifier can effectively exploit the representations learned frommultiple tasks in a non-destructive manner.

TinyBERT: Distilling BERT for Natural Language Understanding

TLDR
A novel Transformer distillation method that is specially designed for knowledge distillation (KD) of the Transformer-based models is proposed and, by leveraging this new KD method, the plenty of knowledge encoded in a large “teacher” BERT can be effectively transferred to a small “student” TinyBERT.

Language Models are Unsupervised Multitask Learners

TLDR
It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.