Cluster & Tune: Boost Cold Start Performance in Text Classification

  title={Cluster \& Tune: Boost Cold Start Performance in Text Classification},
  author={Eyal Shnarch and Ariel Gera and Alon Halfon and Lena Dankin and Leshem Choshen and Ranit Aharonov and Noam Slonim},
In real-world scenarios, a text classification task often begins with a cold start, when labeled data is scarce. In such cases, the common practice of fine-tuning pre-trained models, such as BERT, for a target classification task, is prone to produce poor performance. We suggest a method to boost the performance of such models by adding an intermediate unsupervised classification task, between the pre-training and fine-tuning phases. As such an intermediate task, we perform clustering and train… 

Figures and Tables from this paper

Cold-Start Data Selection for Few-shot Language Model Fine-tuning: A Prompt-Based Uncertainty Propagation Approach

A prompt-based uncertainty propagation approach to estimate the importance of data points and a partition-then-rewrite (P TR) strategy to promote sample diversity when querying for annotations are designed.

Where to start? Analyzing the potential value of intermediate models

A sys-tematic analysis of this intertraining scheme, over a wide range of English classification tasks, suggests that the potential intertraining gain can be analyzed independently for the target dataset under con-sideration, and for a base model being considered as a starting point.

Will It Blend? Mixing Training Paradigms & Prompting for Argument Quality Prediction

This paper performs prompt engineering using GPT-3, and investigates the training paradigms multi-task learning, contrastive learning, and intermediate-task training to find that a mixed prediction setup outperforms single models.



Induction Networks for Few-Shot Text Classification

This paper proposes a novel Induction Network to learn a generalized class-wise representation of each class in the support set, by innovatively leveraging the dynamic routing algorithm in meta-learning and finds the model is able to induce and generalize better.

Combining Unsupervised Pre-training and Annotator Rationales to Improve Low-shot Text Classification

This work combines two approaches to improve low-shot text classification with two novel methods: a simple bag-of-words embedding approach; and a more complex context-aware method, based on the BERT model.

How to Fine-Tune BERT for Text Classification?

A general solution for BERT fine-tuning is provided and new state-of-the-art results on eight widely-studied text classification datasets are obtained.

Diverse Few-Shot Text Classification with Multiple Metrics

This work proposes an adaptive metric learning approach that automatically determines the best weighted combination from a set of metrics obtained from meta-training tasks for a newly seen few-shot task.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

Domain Adaptive Training BERT for Response Selection

The powerful pre-trained language model Bi-directional Encoder Representations from Transformer (BERT) is utilized for a multi-turn dialog system and a highly effective post-training method on domain-specific corpus is proposed.

Do Not Have Enough Data? Deep Learning to the Rescue!

This work uses a powerful pre-trained neural network model to artificially synthesize new labeled data for supervised learning and shows that LAMBADA improves classifiers' performance on a variety of datasets.

Adaptive Self-training for Few-shot Neural Sequence Labeling

Self-training and meta-learning techniques for few-shot training of neural sequence taggers, namely MetaST are developed that help in adaptive sample re-weighting to mitigate error propagation from noisy pseudo-labels.

Few-shot relation classification by context attention-based prototypical networks with BERT

This paper applies few-shot learning to a relation classification task and designs context attention to highlight the crucial instances in the support set to generate a satisfactory prototype, which outperforms the state-of-the-art models and converges faster.

CL-Aff Deep semisupervised clustering

A semi-supervised neural architecture for muti-label settings, that combines deep learning representation and k-means clustering is introduced, that can leverage large-scale unlabeled data and achieve better results compared to baseline unsupervised as well as supervised methods.