• Corpus ID: 230433959

Zero-shot Learning by Generating Task-specific Adapters

@article{Ye2021ZeroshotLB,
  title={Zero-shot Learning by Generating Task-specific Adapters},
  author={Qinyuan Ye and Xiang Ren},
  journal={ArXiv},
  year={2021},
  volume={abs/2101.00420}
}
Pre-trained text-to-text transformers achieve impressive performance across a wide range of NLP tasks, and they naturally support zeroshot learning (ZSL) by using the task description as prompt in the input. However, this approach has potential limitations, as it learns from input-output pairs at instance level, instead of learning to solve tasks at task level. Alternatively, applying existing ZSL methods to text-to-text transformers is non-trivial due to their text generation objective and… 

Figures and Tables from this paper

Cross-Task Generalization via Natural Language Crowdsourcing Instructions
TLDR
This work introduces NATURAL INSTRUCTIONS, a dataset of 61 distinct tasks, their human-authored instructions, and 193k task instances, and adopts generative pre-trained language models to encode task-specific instructions along with input and generate task output.
Natural Instructions: Benchmarking Generalization to New Tasks from Natural Language Instructions
TLDR
This work uses the existing NLP datasets and the instructions used to crowdsource them to create NATURALINSTRUCTIONS, a dataset of instructions and task-specific input/output data that indicates that the existing models indeed benefit from instructions and hence, show improved generalization to new tasks.
Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning
TLDR
It is found that catastrophic forgetting affects generalization ability to a lesser degree than performance on seen tasks; while continual learning algorithms can still bring considerable benefit to the generalizationAbility.
How Many Data Samples is an Additional Instruction Worth?
TLDR
A subset of tasks in the expanded version of NATURAL INSTRUCTIONS is augmented with additional instructions and it is found that these significantly improve model performance, especially in the low-data regime.
Exploiting Adapters for Cross-Lingual Low-Resource Speech Recognition
TLDR
This paper proposes a novel algorithm called SimAdapter for explicitly learning knowledge from adapters for parameter-efficient cross-lingual speech adaptation and shows that these two novel algorithms can be integrated for better performance with up to 3.55% relative WER reduction.
Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm
TLDR
It is suggested that the function of few-shot examples in these cases is better described as locating an already learned task rather than meta-learning, which motivates rethinking the role of prompts in controlling and evaluating powerful language models.
In-BoXBART: Get Instructions into Biomedical Multi-Task Learning
TLDR
This is the first attempt to propose a unified model in the biomedical domain and use instructions to achieve generalization across several biomedical tasks, and indicates that there is room for improvement across tasks in the BoX, implying the scope for future research direction.
Ethical-Advice Taker: Do Language Models Understand Natural Language Interventions?
TLDR
This work proposes a new language understanding task, Linguistic Ethical Interventions (LEI), where the goal is to amend a questionanswering (QA) model’s unethical behavior by communicating context-specific principles of ethics and equity to it.

References

SHOWING 1-10 OF 30 REFERENCES
Parameter-Efficient Transfer Learning for NLP
TLDR
To demonstrate adapter's effectiveness, the recently proposed BERT Transformer model is transferred to 26 diverse text classification tasks, including the GLUE benchmark, and adapter attain near state-of-the-art performance, whilst adding only a few parameters per task.
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
TLDR
This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.
Language Models are Few-Shot Learners
TLDR
GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.
MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer
TLDR
MAD-X is proposed, an adapter-based framework that enables high portability and parameter-efficient transfer to arbitrary tasks and languages by learning modular language and task representations and introduces a novel invertible adapter architecture and a strong baseline method for adapting a pretrained multilingual model to a new language.
Zero-Shot Cross-Lingual Transfer with Meta Learning
TLDR
This work considers the setting of training models on multiple different languages at the same time, when little or no data is available for languages other than English, and demonstrates the consistent effectiveness of meta-learning for a total of 15 languages.
Learning from Task Descriptions
TLDR
This work introduces a framework for developing NLP systems that solve new tasks after reading their descriptions, synthesizing prior work in this area, and instantiates it with a new English language dataset, ZEST, structured for task-oriented evaluation on unseen tasks.
Language to Network: Conditional Parameter Adaptation with Natural Language Descriptions
TLDR
N3 (Neural Networks from Natural Language) - a new paradigm of synthesizing task-specific neural networks from language descriptions and a generic pre-trained model, effectively “fine-tuning” the network for a new task using only language descriptions as input.
Monolingual Adapters for Zero-Shot Neural Machine Translation
We propose a novel adapter layer formalism for adapting multilingual models. They are more parameter-efficient than existing adapter layers while obtaining as good or better performance. The layers
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
TLDR
BART is presented, a denoising autoencoder for pretraining sequence-to-sequence models, which matches the performance of RoBERTa on GLUE and SQuAD, and achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks.
Transformers: State-of-the-Art Natural Language Processing
TLDR
Transformers is an open-source library that consists of carefully engineered state-of-the art Transformer architectures under a unified API and a curated collection of pretrained models made by and available for the community.
...
...