Few-shot Text Classification with Dual Contrastive Consistency

@article{Sun2022FewshotTC,
  title={Few-shot Text Classification with Dual Contrastive Consistency},
  author={Liwen Sun and Jiawei Han},
  journal={ArXiv},
  year={2022},
  volume={abs/2209.15069}
}
In this paper, we explore how to utilize pre-trained language model to perform few-shot text classification where only a few annotated examples are given for each class. Since using traditional cross-entropy loss to fine-tune language model under this scenario causes serious overfitting and leads to sub-optimal generalization of model, we adopt supervised contrastive learning on few labeled data and consistency-regularization on vast unlabeled data. Moreover, we propose a novel contrastive… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 35 REFERENCES

Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning

This work proposes a supervised contrastive learning (SCL) objective for the fine-tuning stage of natural language understanding classification models and demonstrates that the new objective leads to models that are more robust to different levels of noise in the training data, and can generalize better to related tasks with limited labeled task data.

Dual Contrastive Learning: Text Classification via Label-Aware Data Augmentation

A dual contrastive learning (DualCL) framework that simultaneously learns the features of input samples and the parameters of classifiers in the same space and exploits the contrastive learnings between the input sample and the augmented samples is introduced.

MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification

By mixing labeled, unlabeled and augmented data, MixText significantly outperformed current pre-trained and fined-tuned models and other state-of-the-art semi-supervised learning methods on several text classification benchmarks.

ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

ConSERT is presented, a Contrastive Framework for Self-Supervised SEntence Representation Transfer that adopts contrastive learning to fine-tune BERT in an unsupervised and effective way and achieves new state-of-the-art performance on STS tasks.

Unsupervised Data Augmentation for Consistency Training

A new perspective on how to effectively noise unlabeled examples is presented and it is argued that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning.

Making Pre-trained Language Models Better Few-shot Learners

The LM-BFF approach makes minimal assumptions on task resources and domain expertise, and hence constitutes a strong task-agnostic method for few-shot learning.

SimCSE: Simple Contrastive Learning of Sentence Embeddings

SimCSE is presented, a simple contrastive learning framework that greatly advances the state-of-the-art sentence embeddings and regularizes pre-trainedembeddings’ anisotropic space to be more uniform, and it better aligns positive pairs when supervised signals are available.

Supervised Contrastive Learning

A novel training methodology that consistently outperforms cross entropy on supervised learning tasks across different architectures and data augmentations is proposed, and the batch contrastive loss is modified, which has recently been shown to be very effective at learning powerful representations in the self-supervised setting.

A Theoretical Analysis of Contrastive Unsupervised Representation Learning

This framework allows us to show provable guarantees on the performance of the learned representations on the average classification task that is comprised of a subset of the same set of latent classes and shows that learned representations can reduce (labeled) sample complexity on downstream tasks.

Adversarial Training Methods for Semi-Supervised Text Classification

This work extends adversarial and virtual adversarial training to the text domain by applying perturbations to the word embeddings in a recurrent neural network rather than to the original input itself.