Prompt Injection: Parameterization of Fixed Inputs

@article{Choi2022PromptIP,
  title={Prompt Injection: Parameterization of Fixed Inputs},
  author={Eunbi Choi and Yongrae Jo and Joel Jang and Minjoon Seo},
  journal={ArXiv},
  year={2022},
  volume={abs/2206.11349}
}
Recent works have shown that attaching prompts to the input is effective at conditioning Language Models (LM) to perform specific tasks. However, prompts are always included in the input text during inference, thus incurring substantial computational and memory overhead. Also, there is currently no straightforward method of utilizing prompts that are longer than the maximum input length of the LMs without incurring additional costs during inference. We propose Prompt Injection (PI), a novel… 

Figures and Tables from this paper

Learning by Distilling Context

This work shows that context distillation is a general method to train language models, and it can effectively internalize 3 types of training signals, and can internalize step-by-step reasoning for complex tasks.

Retrieval of Soft Prompt Enhances Zero-Shot Task Generalization

This paper explores how the retrieval of soft prompts obtained through prompt tuning can assist hard prompts in zero-shot task generalization and finds that retrieving source embeddings trained on similar answer choice formats is more important than those on similar task types.

References

SHOWING 1-10 OF 42 REFERENCES

Transformer-XL: Attentive Language Models beyond a Fixed-Length Context

This work proposes a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence, which consists of a segment-level recurrence mechanism and a novel positional encoding scheme.

What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers

The possibility of materializing the No Code AI paradigm by providing AI prototyping capabilities to non-experts of ML by introducing HyperCLOVA studio, an interactive prompt engineering interface is discussed and the performance benefits of prompt-based learning are shown and how it can be integrated into the prompt engineering pipeline.

Recipes for Building an Open-Domain Chatbot

Human evaluations show the best models outperform existing approaches in multi-turn dialogue on engagingness and humanness measurements, and the limitations of this work are discussed by analyzing failure cases of the models.

Big Bird: Transformers for Longer Sequences

It is shown that BigBird is a universal approximator of sequence functions and is Turing complete, thereby preserving these properties of the quadratic, full attention model.

Finetuned Language Models Are Zero-Shot Learners

It is shown that instruction tuning —finetuning language models on a collection of datasets described via instructions—substantially improves zero-shot performance on unseen tasks and outperforms few-shot GPT-3 by a large margin on ANLI, RTE, BoolQ, AI2-ARC, OpenbookQA, and StoryCloze.

Training Millions of Personalized Dialogue Agents

A new dataset providing 5 million personas and 700 million persona-based dialogues is introduced and it is shown that, at this scale, training using personas still improves the performance of end-to-end systems.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

A new parameter-efficient fine-tuning method called (IA) 3 that scales activations by learned vectors, attaining stronger performance while only introducing a relatively tiny amount of new parameters.

Language Models are Few-Shot Learners

GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.

Language Models are Unsupervised Multitask Learners

It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.