WARP: Word-level Adversarial ReProgramming

@inproceedings{Hambardzumyan2021WARPWA,
  title={WARP: Word-level Adversarial ReProgramming},
  author={Karen Hambardzumyan and H. Khachatrian and Jonathan May},
  booktitle={ACL},
  year={2021}
}
Transfer learning from pretrained language models recently became the dominant approach for solving many NLP tasks. A common approach to transfer learning for multiple tasks that maximize parameter sharing trains one or more task-specific layers on top of the language model. In this paper, we present an alternative approach based on adversarial reprogramming, which extends earlier work on automatic prompt generation. Adversarial reprogramming attempts to learn task-specific word embeddings that… 

Figures and Tables from this paper

Making Pre-trained Language Models End-to-end Few-shot Learners with Contrastive Prompt Tuning
TLDR
CP-Tuning is presented, the first end-to-end Contrastive Prompt Tuning framework for fine-tuning PLMs without any manual engineering of task-specific prompts and verbalizers, and it is integrated with the task-invariantcontinuous prompt encoding technique with fully trainable prompt parameters.
Contrastive Demonstration Tuning for Pre-trained Language Models
TLDR
Experimental results illustrate that the proposed novel pluggable, ex-tensible, and efficient approach named contrastive demonstration tuning, which is free of demonstration sampling, integrated with previous approaches LM-BFF and P-tuning can yield better performance.
Prototypical Verbalizer for Prompt-based Few-shot Tuning
TLDR
This work proposes the prototypical verbalizer (ProtoVerb) which is built directly from training data and demonstrates that ProtoVerb significantly outperforms current automatic verbalizers, especially when training data is extremely scarce.
Zero-shot Cross-lingual Transfer of Prompt-based Tuning with a Unified Multilingual Prompt
TLDR
A novel model that uses a unified prompt for all languages, called UniPrompt, which is model-based and languageagnostic, and can significantly outperform the strong baselines across different languages.
Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey
TLDR
A survey of recent work that uses large, pre-trained transformer-based language models to solve NLP tasks via pre-training then fine-tuning, prompting, or text generation approaches.
The Power of Scale for Parameter-Efficient Prompt Tuning
TLDR
This work explores “prompt tuning”, a simple yet effective mechanism for learning “soft prompts” to condition frozen language models to perform specific downstream tasks, and shows that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning.
Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning
TLDR
An analysis framework is proposed that links the pretraining and downstream tasks with an underlying latent variable generative model of text — the downstream classifier must recover a function of the posterior distribution over the latent variables to recover downstream guarantees with weaker non-degeneracy conditions.
$\mathcal{Y}$-Tuning: An Efficient Tuning Paradigm for Large-Scale Pre-Trained Models via Label Representation Learning
TLDR
Y-Tuning is proposed, an efficient yet effective paradigm to adapt frozen large-scale PTMs to specific downstream tasks and achieves performance more than 96% of full fine-tuning on GLUE Benchmark with only 2% tunable parameters and much fewer training costs.
ASCM: An Answer Space Clustered Prompting Method without Answer Engineering
TLDR
This work proposes an answer space clustered prompting model (ASCM) together with a synonym initialization method (SI) which automatically categorizes all answer tokens in a semantic-clustered embedding space and proposes a stable semi-supervised method named stair learning (SL) that orderly distills knowledge from better models to weaker models.
Black-Box Tuning for Language-Model-as-a-Service
TLDR
The experimental results show that the black-box tuning with RoBERTa on a few labeled samples not only outperforms manual prompt and GPT-3’s in-context learning, but also surpasses the gradient-based counterparts, i.e., prompt tuning and full model tuning.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 40 REFERENCES
AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts
  • Proceedings of the 2020 Conference on Empirical Meth-
  • 2020
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
TLDR
A Sentiment Treebank that includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality, and introduces the Recursive Neural Tensor Network.
It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
TLDR
This work shows that performance similar to GPT-3 can be obtained with language models that are much “greener” in that their parameter count is several orders of magnitude smaller, and identifies key factors required for successful natural language understanding with small language models.
Adversarial Reprogramming of Neural Networks
TLDR
This paper demonstrates adversarial reprogramming on six ImageNet classification models, repurposing these models to perform a counting task, as well as classification tasks: classification of MNIST and CIFAR-10 examples presented as inputs to the ImageNet model.
RoBERTa: A Robustly Optimized BERT Pretraining Approach
TLDR
It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.
The Sixth PASCAL Recognizing Textual Entailment Challenge
TLDR
This paper presents the Sixth Recognizing Textual Entailment (RTE-6) challenge, as the traditional Main Task was replaced by a new task, similar to the RTE-5 Search Pilot, in which TextualEntailment is performed on a real corpus in the Update Summarization scenario.
AdapterFusion: Non-Destructive Task Composition for Transfer Learning
TLDR
This work proposes AdapterFusion, a new two stage learning algorithm that leverages knowledge from multiple tasks by separating the two stages, i.e., knowledge extraction and knowledge composition, so that the classifier can effectively exploit the representations learned frommultiple tasks in a non-destructive manner.
Autoprompt: Eliciti knowle ge from language models with automatically generated prompts
  • ArXiv, abs/2010.15980.
  • 2020
TinyBERT: Distilling BERT for Natural Language Understanding
TLDR
A novel Transformer distillation method that is specially designed for knowledge distillation (KD) of the Transformer-based models is proposed and, by leveraging this new KD method, the plenty of knowledge encoded in a large “teacher” BERT can be effectively transferred to a small “student” TinyBERT.
...
1
2
3
4
...