Prompt Injection: Parameterization of Fixed Inputs

@article{Choi2022PromptIP,
  title={Prompt Injection: Parameterization of Fixed Inputs},
  author={Eunbi Choi and Yongrae Jo and Joel Jang and Minjoon Seo},
  journal={ArXiv},
  year={2022},
  volume={abs/2206.11349}
}
Recent works have shown that attaching prompts to the input is effective at conditioning Language Models (LM) to perform specific tasks. However, prompts are always included in the input text during inference, thus incurring substantial computational and memory overhead. Also, there is currently no straightforward method of utilizing prompts that are longer than the maximum input length of the LMs without incurring additional costs during inference. We propose Prompt Injection (PI), a novel… 

Figures and Tables from this paper

Learning by Distilling Context

Language models significantly benefit from context tokens, such as prompts or scratchpads. They perform better when prompted with informative instructions, and they acquire new reasoning capabilities

References

SHOWING 1-10 OF 42 REFERENCES

LongT5: Efficient Text-To-Text Transformer for Long Sequences

TLDR
This paper presents LongT5, a new model that explores the effects of scaling both the input length and model size at the same time, and creates a new attention mechanism called Transient Global (TGlobal), which mimics ETC’s local/global attention mechanism, but without requiring additional side-inputs.

What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers

TLDR
The possibility of materializing the No Code AI paradigm by providing AI prototyping capabilities to non-experts of ML by introducing HyperCLOVA studio, an interactive prompt engineering interface is discussed and the performance benefits of prompt-based learning are shown and how it can be integrated into the prompt engineering pipeline.

Recipes for Building an Open-Domain Chatbot

TLDR
Human evaluations show the best models outperform existing approaches in multi-turn dialogue on engagingness and humanness measurements, and the limitations of this work are discussed by analyzing failure cases of the models.

Big Bird: Transformers for Longer Sequences

TLDR
It is shown that BigBird is a universal approximator of sequence functions and is Turing complete, thereby preserving these properties of the quadratic, full attention model.

Finetuned Language Models Are Zero-Shot Learners

TLDR
It is shown that instruction tuning —finetuning language models on a collection of datasets described via instructions—substantially improves zero-shot performance on unseen tasks and outperforms few-shot GPT-3 by a large margin on ANLI, RTE, BoolQ, AI2-ARC, OpenbookQA, and StoryCloze.

Training Millions of Personalized Dialogue Agents

TLDR
A new dataset providing 5 million personas and 700 million persona-based dialogues is introduced and it is shown that, at this scale, training using personas still improves the performance of end-to-end systems.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

TLDR
This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

TLDR
A new parameter-efficient fine-tuning method called (IA) 3 that scales activations by learned vectors, attaining stronger performance while only introducing a relatively tiny amount of new parameters.

K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters

TLDR
K-Adapter is proposed, which remains the original parameters of the pre-trained model fixed and supports continual knowledge infusion and captures richer factual and commonsense knowledge than RoBERTa.

Multitask Prompted Training Enables Zero-Shot Task Generalization

TLDR
A system for easily mapping any natural language tasks into a human-readable prompted form and fine-tune a pretrained encoder-decoder model on this multitask mixture covering a wide variety of tasks.