Learning to Generate Task-Specific Adapters from Task Description

@inproceedings{Ye2021LearningTG,
  title={Learning to Generate Task-Specific Adapters from Task Description},
  author={Qinyuan Ye and Xiang Ren},
  booktitle={ACL},
  year={2021}
}
Pre-trained text-to-text transformers such as BART have achieved impressive performance across a range of NLP tasks. Recent study further shows that they can learn to generalize to novel tasks, by including task descriptions as part of the source sequence and training the model with (source, target) examples. At test time, these fine-tuned models can make inferences on new tasks using the new task descriptions as part of the input. However, this approach has potential limitations, as the model… 

Figures and Tables from this paper

Benchmarking Generalization via In-Context Instructions on 1, 600+ Language Tasks
TLDR
This work introduces N ATURAL -I NSTRUCTIONS v 2, a collection of 1,600+ diverse language tasks and their expert written instructions that covers 70+ distinct task types, such as tagging, in-filling, and rewriting.
Exploiting Adapters for Cross-Lingual Low-Resource Speech Recognition
TLDR
This paper proposes a novel algorithm called SimAdapter for explicitly learning knowledge from adapters for parameter-efficient cross-lingual speech adaptation and shows that these two novel algorithms can be integrated for better performance with up to 3.55% relative WER reduction.
HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning
TLDR
This work proposes a HyperTransformer, a transformer-based model for fewshot learning that generates weights of a convolutional neural network (CNN) directly from support samples and finds that generating the last layer alone allows it to produce competitive or better results than those obtained with state-of-the-art methods while being end-to-end differentiable.
Hyperdecoders: Instance-specific decoders for multi-task NLP
TLDR
An analysis of the embeddings produced by the model suggests that a large benefit of the approach is allowing the encoder more effective control over the decoder, allowing mapping from hidden representations to a final textbased label without interference from other tasks’ output formats or labels.
In-BoXBART: Get Instructions into Biomedical Multi-Task Learning
TLDR
This is the first attempt to propose a unified model in the biomedical domain and use instructions to achieve generalization across several biomedical tasks, and indicates that there is room for improvement across tasks in the BoX, implying the scope for future research direction.
Multilingual Machine Translation with Hyper-Adapters
Multilingual machine translation suffers from negative interference across languages. A common solution is to relax parameter sharing with language-specific modules like adapters. However, adapters of

References

SHOWING 1-10 OF 30 REFERENCES
Learning from Task Descriptions
TLDR
This work introduces a framework for developing NLP systems that solve new tasks after reading their descriptions, synthesizing prior work in this area, and instantiates it with a new English language dataset, ZEST, structured for task-oriented evaluation on unseen tasks.
NewsQA: A Machine Comprehension Dataset
TLDR
NewsQA, a challenging machine comprehension dataset of over 100,000 human-generated question-answer pairs, is presented and analysis confirms that NewsQA demands abilities beyond simple word matching and recognizing textual entailment.
SQuAD: 100,000+ Questions for Machine Comprehension of Text
TLDR
A strong logistic regression model is built, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%).
Parameter-Efficient Transfer Learning for NLP
TLDR
To demonstrate adapter's effectiveness, the recently proposed BERT Transformer model is transferred to 26 diverse text classification tasks, including the GLUE benchmark, and adapter attain near state-of-the-art performance, whilst adding only a few parameters per task.
HyperNetworks
This work explores hypernetworks: an approach of using one network, also known as a hypernetwork, to generate the weights for another network. We apply hypernetworks to generate adaptive weights for
Natural Instructions: Benchmarking Generalization to New Tasks from Natural Language Instructions
TLDR
This work uses the existing NLP datasets and the instructions used to crowdsource them to create NATURALINSTRUCTIONS, a dataset of instructions and task-specific input/output data that indicates that the existing models indeed benefit from instructions and hence, show improved generalization to new tasks.
A Unified MRC Framework for Named Entity Recognition
TLDR
This paper proposes to formulate the task of NER as a machine reading comprehension (MRC) task, and naturally tackles the entity overlapping issue in nested NER: the extraction of two overlapping entities with different categories requires answering two independent questions.
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
TLDR
BART is presented, a denoising autoencoder for pretraining sequence-to-sequence models, which matches the performance of RoBERTa on GLUE and SQuAD, and achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks.
BART: Denoising sequence-to-sequence pretraining for natural language
  • 2020
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
TLDR
This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.
...
1
2
3
...