Corpus ID: 195346829

Fine-tuned Language Models for Text Classification

@article{Howard2018FinetunedLM,
  title={Fine-tuned Language Models for Text Classification},
  author={Jeremy Howard and Sebastian Ruder},
  journal={ArXiv},
  year={2018},
  volume={abs/1801.06146}
}
Transfer learning has revolutionized computer vision, but existing approaches in NLP still require task-specific modifications and training from scratch. We propose Fine-tuned Language Models (FitLaM), an effective transfer learning method that can be applied to any task in NLP, and introduce techniques that are key for fine-tuning a state-of-the-art language model. Our method significantly outperforms the state-of-the-art on five text classification tasks, reducing the error by 18-24% on the… Expand
Transfer Learning for Textual Topic Classificaton
The recent developments of Language Modeling led to advances in transfer learning methods in Natural Language Processing. Language Models pretrained on large general datasets achievedExpand
A Survey on Transfer Learning in Natural Language Processing
TLDR
This survey features the recent transfer learning advances in the field of NLP and provides a taxonomy for categorizing different transfer learning approaches from the literature. Expand
An Empirical Study on Pre-trained Embeddings and Language Models for Bot Detection
TLDR
This paper focuses on bot detection in Twitter as a evaluation task and test the performance of fine-tuning approaches based on language models against popular neural architectures such as LSTM and CNN combined with pre-trained and contextualized embeddings. Expand
Contextualized Word Representations for Self-Attention Network
TLDR
It is demonstrated that a free RNN/CNN self-attention model used for sentiment analysis can be improved with 2.53% by using contextualized word representation learned in a language modeling task. Expand
Large-Scale Transfer Learning for Natural Language Generation
TLDR
This work focuses in particular on open-domain dialog as a typical high entropy generation task, presenting and comparing different architectures for adapting pretrained models with state of the art results. Expand
A Comparison of LSTM and BERT for Small Corpus
TLDR
The experimental results show that bidirectional LSTM models can achieve significantly higher results than a BERT model for a small dataset and these simple models get trained in much less time than tuning the pre-trained counterparts. Expand
DIET: Lightweight Language Understanding for Dialogue Systems
Large-scale pre-trained language models have shown impressive results on language understanding benchmarks like GLUE and SuperGLUE, improving considerably over other pre-training methods likeExpand
Large Scale Legal Text Classification Using Transformer Models
TLDR
This work studies the performance of various recent transformer-based models in combination with strategies such as generative pretraining, gradual unfreezing and discriminative learning rates in order to reach competitive classification performance, and presents new state-of-the-art results. Expand
Self-Attentive Model for Headline Generation
TLDR
This work applied recent Universal Transformer architecture paired with byte-pair encoding technique and achieved new state-of-the-art results on the New York Times Annotated corpus, presenting the new RIA corpus and reaching ROUGE-L F1-score 36.81 and RouGE-2 F2-score 22.15. Expand
Saagie at Semeval-2019 Task 5: From Universal Text Embeddings and Classical Features to Domain-specific Text Classification
TLDR
This paper proposes an approach based on a feature-level Meta-Embedding to let the model choose which features to keep and how to use them and to investigate how domain-specific text classification task can benefit from pretrained state of the art language models. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 58 REFERENCES
Semi-supervised sequence tagging with bidirectional language models
TLDR
A general semi-supervised approach for adding pre- trained context embeddings from bidirectional language models to NLP systems and apply it to sequence labeling tasks, surpassing previous systems that use other forms of transfer or joint learning with additional labeled data and task specific gazetteers. Expand
Semi-supervised Multitask Learning for Sequence Labeling
TLDR
A sequence labeling framework with a secondary training objective, learning to predict surrounding words for every word in the dataset, which incentivises the system to learn general-purpose patterns of semantic and syntactic composition, useful for improving accuracy on different sequence labeling tasks. Expand
Question Answering through Transfer Learning from Large Fine-grained Supervision Data
TLDR
It is shown that the task of question answering (QA) can significantly benefit from the transfer learning of models trained on a different large, fine-grained QA dataset and that finer supervision provides better guidance for learning lexical and syntactic information than coarser supervision. Expand
Empower Sequence Labeling with Task-Aware Neural Language Model
TLDR
A novel neural framework to extract abundant knowledge hidden in raw texts to empower the sequence labeling task by leveraging character-level knowledge from self-contained order information of training sequences is developed. Expand
Learned in Translation: Contextualized Word Vectors
TLDR
Adding context vectors to a deep LSTM encoder from an attentional sequence-to-sequence model trained for machine translation to contextualize word vectors improves performance over using only unsupervised word and character vectors on a wide variety of common NLP tasks. Expand
Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
TLDR
It is shown how universal sentence representations trained using the supervised data of the Stanford Natural Language Inference datasets can consistently outperform unsupervised methods like SkipThought vectors on a wide range of transfer tasks. Expand
Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings
TLDR
Embeddings of text regions, which can convey complex concepts, are more useful than embeddings of single words in isolation on this task, and are reported exceeding the previous best results on four benchmark datasets. Expand
Convolutional Neural Networks for Sentence Classification
TLDR
The CNN models discussed herein improve upon the state of the art on 4 out of 7 tasks, which include sentiment analysis and question classification, and are proposed to allow for the use of both task-specific and static vectors. Expand
Improving Neural Machine Translation Models with Monolingual Data
TLDR
This work pairs monolingual training data with an automatic back-translation, and can treat it as additional parallel training data, and obtains substantial improvements on the WMT 15 task English German, and for the low-resourced IWSLT 14 task Turkish->English. Expand
Character-level Convolutional Networks for Text Classification
TLDR
This article constructed several large-scale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results in text classification. Expand
...
1
2
3
4
5
...