Corpus ID: 204838007

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

@article{Raffel2020ExploringTL,
  title={Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},
  author={Colin Raffel and Noam M. Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and W. Li and Peter J. Liu},
  journal={ArXiv},
  year={2020},
  volume={abs/1910.10683}
}
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic… Expand
Studying the Usage of Text-To-Text Transfer Transformer to Support Code-Related Tasks
TLDR
This paper empirically investigated how the T5 model performs when pre-trained and fine-tuned to support code-related tasks, and compared the performance of this single model with the results reported in the four original papers proposing DL-based solutions for those four tasks. Expand
TransBERT: A Three-Stage Pre-training Technology for Story-Ending Prediction
  • Zhongyang Li, Xiao Ding, Ting Liu
  • Computer Science
  • ACM Trans. Asian Low Resour. Lang. Inf. Process.
  • 2021
TLDR
This study investigates a transferable BERT (TransBERT) training framework, which can transfer not only general language knowledge from large-scale unlabeled data but also specific kinds of knowledge from various semantically related supervised tasks, for a target task. Expand
KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation
TLDR
A knowledge-grounded pre-training (KGPT) is proposed, which consists of two parts, 1) a general knowledge-Grounded generation model to generate knowledge-enriched text and 2) a pre- training paradigm on a massive knowledge- grounded text corpus crawled from the web. Expand
Improving Text-to-Text Pre-trained Models for the Graph-to-Text Task
Converting a knowledge graph or sub-graph to natural text is useful when answering questions based on a knowledge base. Highcapacity language models pre-trained on largescale text corpora haveExpand
Multi-task learning for natural language processing in the 2020s: where are we going?
TLDR
This paper strives to provide a comprehensive survey of the numerous recent MTL contributions to the field of natural language processing and provide a forum to focus efforts on the hardest unsolved problems in the next decade. Expand
Exploring and Predicting Transferability across NLP Tasks
TLDR
The results show that transfer learning is more beneficial than previously thought, especially when target task data is scarce, and can improve performance even when the source task is small or differs substantially from the target task. Expand
An Investigation of Fine-tuning Pre-trained Model for MR-to-Text Generation
  • Ting Hu, C. Meinel
  • Computer Science
  • 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA)
  • 2020
TLDR
Different methods to organize the MRs are explored and it is shown that just linearizing the information in MRs achieve decent results, while complex annotation process can be omitted. Expand
Zero-shot Text Classification With Generative Language Models
TLDR
This work investigates the use of natural language to enable zero-shot model adaptation to new tasks, using text and metadata from social commenting platforms as a source for a simple pretraining task and shows that natural language can serve as simple and powerful descriptors for task adaptation. Expand
CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP
TLDR
This paper introduces CROSSFIT, a task setup for studying cross-task few-shot learning ability, which standardizes seen/unseen task splits, data access during different learning stages, and the evaluation protocols, and presents NLP Few-shot Gym, a repository of 160 few- Shots tasks, covering diverse task categories and applications, and converted to a unified text-to-text format. Expand
Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning
TLDR
This paper introduces a new scoring method that casts a plausibility ranking task in a full-text format and leverages the masked language modeling head tuned during the pre-training phase and requires less annotated data than the standard classifier approach to reach equivalent performances. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 137 REFERENCES
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding
TLDR
Inspired by the linearization exploration work of Elman, BERT is extended to a new model, StructBERT, by incorporating language structures into pre-training, and the new model is adapted to different levels of language understanding required by downstream tasks. Expand
Universal Language Model Fine-tuning for Text Classification
TLDR
This work proposes Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduces techniques that are key for fine- Tuning a language model. Expand
Transfer Learning in Natural Language Processing
TLDR
An overview of modern transfer learning methods in NLP, how models are pre-trained, what information the representations they learn capture, and review examples and case studies on how these models can be integrated and adapted in downstream NLP tasks are presented. Expand
Unified Language Model Pre-training for Natural Language Understanding and Generation
TLDR
A new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks that compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks. Expand
Language Models are Unsupervised Multitask Learners
TLDR
It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations. Expand
Improving Language Understanding by Generative Pre-Training
TLDR
The general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, significantly improving upon the state of the art in 9 out of the 12 tasks studied. Expand
Multi-Task Deep Neural Networks for Natural Language Understanding
TLDR
A Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks that allows domain adaptation with substantially fewer in-domain labels than the pre-trained BERT representations. Expand
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
TLDR
A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks. Expand
Unsupervised Pretraining for Sequence to Sequence Learning
TLDR
This work presents a general unsupervised learning method to improve the accuracy of sequence to sequence (seq2seq) models by pretraining the weights of the encoder and decoder with the pretrained weights of two language models and then fine-tuned with labeled data. Expand
Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning
TLDR
This work presents a simple, effective multi-task learning framework for sentence representations that combines the inductive biases of diverse training objectives in a single model and demonstrates that sharing a single recurrent sentence encoder across weakly related tasks leads to consistent improvements over previous methods. Expand
...
1
2
3
4
5
...