Hyperdecoders: Instance-specific decoders for multi-task NLP

  title={Hyperdecoders: Instance-specific decoders for multi-task NLP},
  author={Hamish Ivison and Matthew E. Peters},
We investigate input-conditioned hypernetworks for multi-tasking in NLP, generating parameter-efficient adaptations for a decoder using a hypernetwork conditioned on the output of an encoder. This approach produces a unique decoder for every input instance, allowing the network a larger degree of flexibility than prior work that specializes the decoder for each task. We apply our method to sequence classification tasks, extractive QA, and summarisation and find that it surpasses previous parameter… 

ATTEMPT: Parameter-Efficient Multi-task Tuning via Attentional Mixtures of Soft Prompts

A TTEMPT is highly parameter-efficient (e.g., updates 2,300 times fewer parameters than full fine-tuning), while achieving high task performance using knowledge from high-resource tasks and can be modular using pre-trained soft prompts and add or remove source prompts for effective knowledge transfer.

Weight-Specific-Decoder Attention Model to Solve Multiobjective Combinatorial Optimization Problems

A Weight-Specific-Decoder Attention Model (WSDAM) is proposed to better approximate the whole Pareto set and outperforms current state-of-the-art learning-based methods in both solution quality and generalization ability.

Boosting Natural Language Generation from Instructions with Meta-Learning

This paper proposes to adapt meta-learning to MTIL in three directions: 1) Model Agnostic Meta Learning (MAML), 2) Hyper-Network based adaptation to generate task specific parameters conditioned on instructions, and 3) an approach combining HNet and MAML.



Adversarial NLI: A New Benchmark for Natural Language Understanding

This work introduces a new large-scale NLI benchmark dataset, collected via an iterative, adversarial human-and-model-in-the-loop procedure, and shows that non-expert annotators are successful at finding their weaknesses.

Towards a Unified View of Parameter-Efficient Transfer Learning

This paper re-frames state-of-the-art parameter-efficient transfer learning methods as modifications to specific hidden states in pretrained models, and defines a set of design dimensions along which different methods vary, achieving comparable results to fine-tuning all parameters on all four tasks.

Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks

This paper shows that one can learn adapter parameters for all layers and tasks by generating them using shared hypernetworks, which condition on task, adapter position, and layer id in a transformer model.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks.

SQuAD: 100,000+ Questions for Machine Comprehension of Text

A strong logistic regression model is built, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%).

ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning

ExMIX (Extreme Mixture): a massive collection of 107 supervised NLP tasks across diverse domains and task-families is introduced, and a model pre-trained using a multi-task objective of self-supervised span denoising and supervised EXMIX is proposed.

HyperGrid Transformers: Towards A Single Model for Multiple Tasks

This paper proposes HyperGrid Transformers, a new Transformer architecture that leverages task-conditioned hyper networks for controlling its feed-forward layers and proposes a decomposable hypernetwork that learns grid-wise projections that help to specialize regions in weight matrices for different tasks.

The Power of Scale for Parameter-Efficient Prompt Tuning

This work explores “prompt tuning”, a simple yet effective mechanism for learning “soft prompts” to condition frozen language models to perform specific downstream tasks, and shows that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning.

Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data

A novel transformer based architecture consisting of a new conditional attention mechanism as well as a set of task conditioned modules that facilitate weight sharing is proposed that is able to surpass single-task fine-tuning methods while being parameter and data efficient.