How to Fine-Tune BERT for Text Classification?

@article{Sun2019HowTF,
  title={How to Fine-Tune BERT for Text Classification?},
  author={Chi Sun and Xipeng Qiu and Yige Xu and Xuanjing Huang},
  journal={ArXiv},
  year={2019},
  volume={abs/1905.05583}
}
Language model pre-training has proven to be useful in learning universal language representations. [...] Key Result Finally, the proposed solution obtains new state-of-the-art results on eight widely-studied text classification datasets.Expand
Improving BERT-Based Text Classification With Auxiliary Sentence and Domain Knowledge
TLDR
A BERT-based text classification model BERT4TC is proposed via constructing auxiliary sentence to turn the classification task into a binary sentence-pair one, aiming to address the limited training data problem and task-awareness problem. Expand
Investigating the Performance of Fine-tuned Text Classification Models Based-on Bert
  • Samin Mohammadi, Mathieu Chapon
  • Computer Science
  • 2020 IEEE 22nd International Conference on High Performance Computing and Communications; IEEE 18th International Conference on Smart City; IEEE 6th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)
  • 2020
TLDR
It is discovered that adding a simple dense layer to the pre-trained Bert model, as a classifier, surpasses other types of deep neural network layers in the investigated tasks. Expand
Efficient Task Adaptation with Normalization
  • Wenxuan Zhou
  • 2019
Large pre-trained text encoders like BERT start a new chapter in natural language processing. A common practice to apply pre-trained encoders to sequence classification tasks (e.g., classification ofExpand
SimpleTran: Transferring Pre-Trained Sentence Embeddings for Low Resource Text Classification
TLDR
This work proposes an alternative transfer learning approach called SimpleTran which is simple and effective for low resource text classification characterized by small sized datasets which outperforms fine-tuning on small and medium sized datasets with negligible computational overhead. Expand
Go Simple and Pre-Train on Domain-Specific Corpora: On the Role of Training Data for Text Classification
TLDR
This paper compares the performance of a light-weight linear classifier based on word embeddings versus a pre-trained language model, i.e., BERT, across a wide range of datasets and classification tasks, and shows the importance of domain-specific unlabeled data. Expand
A Comparison of Pre-Trained Language Models for Multi-Class Text Classification in the Financial Domain
TLDR
This study investigates the case of multi-class text classification, a task that is relatively less studied in the literature evaluating pre-trained language models, and finds that the FinBERT model, even with an adapted vocabulary, does not lead to improvements compared to the generic BERT models. Expand
Adjusting BERT's Pooling Layer for Large-Scale Multi-Label Text Classification
TLDR
A pooling layer architecture on top of BERT models is proposed, which improves the quality of classification by using information from the standard [CLS] token in combination with pooled sequence output. Expand
Advances of Transformer-Based Models for News Headline Generation
TLDR
Two pretrained Transformer-based models (mBART and BertSumAbs) are fine-tune for headline generation and achieve new state-of-the-art results on the RIA and Lenta datasets of Russian news. Expand
Exploring Large Language Models in a Limited Resource Scenario
Generative Pre-trained Transformers (GPT) have gained a lot of popularity in the domain of Natural Language Processing (NPL). Lately, GPTs have been fine-tuned for tasks like sentiment analysis andExpand
Fine-tuning BERT for Low-Resource Natural Language Understanding via Active Learning
TLDR
This work explores fine-tuning methods of BERT - a pre-trained Transformer based language model - by utilizing pool-based active learning to speed up training while keeping the cost of labeling new data constant, and demonstrates and analyzes the benefits of freezing layers of the language model during fine- tuning to reduce the number of trainable parameters. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 39 REFERENCES
Universal Language Model Fine-tuning for Text Classification
TLDR
This work proposes Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduces techniques that are key for fine- Tuning a language model. Expand
Improving Language Understanding by Generative Pre-Training
TLDR
The general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, significantly improving upon the state of the art in 9 out of the 12 tasks studied. Expand
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks. Expand
Empower Sequence Labeling with Task-Aware Neural Language Model
TLDR
A novel neural framework to extract abundant knowledge hidden in raw texts to empower the sequence labeling task by leveraging character-level knowledge from self-contained order information of training sequences is developed. Expand
Semi-supervised Multitask Learning for Sequence Labeling
TLDR
A sequence labeling framework with a secondary training objective, learning to predict surrounding words for every word in the dataset, which incentivises the system to learn general-purpose patterns of semantic and syntactic composition, useful for improving accuracy on different sequence labeling tasks. Expand
Recurrent Neural Network for Text Classification with Multi-Task Learning
TLDR
This paper uses the multi-task learning framework to jointly learn across multiple related tasks based on recurrent neural network to propose three different mechanisms of sharing information to model text with task-specific and shared layers. Expand
Learned in Translation: Contextualized Word Vectors
TLDR
Adding context vectors to a deep LSTM encoder from an attentional sequence-to-sequence model trained for machine translation to contextualize word vectors improves performance over using only unsupervised word and character vectors on a wide variety of common NLP tasks. Expand
Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
TLDR
It is shown how universal sentence representations trained using the supervised data of the Stanford Natural Language Inference datasets can consistently outperform unsupervised methods like SkipThought vectors on a wide range of transfer tasks. Expand
Character-level Convolutional Networks for Text Classification
TLDR
This article constructed several large-scale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results in text classification. Expand
Skip-Thought Vectors
We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoder-decoder model that tries to reconstruct theExpand
...
1
2
3
4
...