• Corpus ID: 204838007

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

  title={Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},
  author={Colin Raffel and Noam M. Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic… 

Studying the Usage of Text-To-Text Transfer Transformer to Support Code-Related Tasks

This paper empirically investigated how the T5 model performs when pre-trained and fine-tuned to support code-related tasks, and compared the performance of this single model with the results reported in the four original papers proposing DL-based solutions for those four tasks.

TransBERT: A Three-Stage Pre-training Technology for Story-Ending Prediction

This study investigates a transferable BERT (TransBERT) training framework, which can transfer not only general language knowledge from large-scale unlabeled data but also specific kinds of knowledge from various semantically related supervised tasks, for a target task.

KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation

A knowledge-grounded pre-training (KGPT) is proposed, which consists of two parts, 1) a general knowledge-Grounded generation model to generate knowledge-enriched text and 2) a pre- training paradigm on a massive knowledge- grounded text corpus crawled from the web.

Multi-task learning for natural language processing in the 2020s: where are we going?

Improving Text-to-Text Pre-trained Models for the Graph-to-Text Task

This paper proposes two classes of methods to improve highcapacity language models pre-trained on largescale text corpora for the knowledge-graph-to-text (KG- to-text) task by improving the structure awareness of the model as well as learning optimal ordering via multitask learning.

Exploring and Predicting Transferability across NLP Tasks

The results show that transfer learning is more beneficial than previously thought, especially when target task data is scarce, and can improve performance even when the source task is small or differs substantially from the target task.

SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

A new learning framework for robust and efficient fine-tuning for pre-trained models to attain better generalization performance and outperforms the state-of-the-art T5 model, which is the largest pre- trained model containing 11 billion parameters, on GLUE.

An Investigation of Fine-tuning Pre-trained Model for MR-to-Text Generation

  • Ting HuC. Meinel
  • Computer Science
    2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA)
  • 2020
Different methods to organize the MRs are explored and it is shown that just linearizing the information in MRs achieve decent results, while complex annotation process can be omitted.

Zero-shot Text Classification With Generative Language Models

This work investigates the use of natural language to enable zero-shot model adaptation to new tasks, using text and metadata from social commenting platforms as a source for a simple pretraining task and shows that natural language can serve as simple and powerful descriptors for task adaptation.

Pre-training Text-to-Text Transformers for Concept-centric Common Sense

It is shown that while only incrementally pre-trained on a relatively small corpus for a few steps, CALM outperforms baseline methods by a consistent margin and even comparable with some larger PTLMs, which suggests that CALM can serve as a general, plug-and-play method for improving the commonsense reasoning ability of a PTLM.



Universal Language Model Fine-tuning for Text Classification

This work proposes Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduces techniques that are key for fine- Tuning a language model.

StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

Inspired by the linearization exploration work of Elman, BERT is extended to a new model, StructBERT, by incorporating language structures into pre-training, and the new model is adapted to different levels of language understanding required by downstream tasks.

Transfer Learning in Natural Language Processing

An overview of modern transfer learning methods in NLP, how models are pre-trained, what information the representations they learn capture, and review examples and case studies on how these models can be integrated and adapted in downstream NLP tasks are presented.

Language Models are Unsupervised Multitask Learners

It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

Unified Language Model Pre-training for Natural Language Understanding and Generation

A new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks that compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks.

Improving Language Understanding by Generative Pre-Training

The general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, improving upon the state of the art in 9 out of the 12 tasks studied.

Multi-Task Deep Neural Networks for Natural Language Understanding

A Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks that allows domain adaptation with substantially fewer in-domain labels than the pre-trained BERT representations.

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks.

Unsupervised Pretraining for Sequence to Sequence Learning

This work presents a general unsupervised learning method to improve the accuracy of sequence to sequence (seq2seq) models by pretraining the weights of the encoder and decoder with the pretrained weights of two language models and then fine-tuned with labeled data.

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

This work presents a simple, effective multi-task learning framework for sentence representations that combines the inductive biases of diverse training objectives in a single model and demonstrates that sharing a single recurrent sentence encoder across weakly related tasks leads to consistent improvements over previous methods.