Distance Metric Learning Loss Functions in Few-Shot Scenarios of Supervised Language Models Fine-Tuning

  title={Distance Metric Learning Loss Functions in Few-Shot Scenarios of Supervised Language Models Fine-Tuning},
  author={Witold Sosnowski and Karolina Seweryn and Anna Wr'oblewska and Piotr Gawrysiak},

Figures and Tables from this paper



Learning to Few-Shot Learn Across Diverse Natural Language Classification Tasks

LEOPARD is trained with the state-of-the-art transformer architecture and shows better generalization to tasks not seen at all during training, with as few as 4 examples per label, than self-supervised pre-training or multi-task training.

Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning

This work proposes a supervised contrastive learning (SCL) objective for the fine-tuning stage of natural language understanding classification models and demonstrates that the new objective leads to models that are more robust to different levels of noise in the training data, and can generalize better to related tasks with limited labeled task data.

Diversity With Cooperation: Ensemble Methods for Few-Shot Classification

This work shows that by addressing the fundamental high-variance issue of few-shot learning classifiers, it is possible to significantly outperform current meta-learning techniques.

Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss

A theoretically-principled label-distribution-aware margin (LDAM) loss motivated by minimizing a margin-based generalization bound is proposed that replaces the standard cross-entropy objective during training and can be applied with prior strategies for training with class-imbalance such as re-weighting or re-sampling.

SoftTriple Loss: Deep Metric Learning Without Triplet Sampling

The SoftTriple loss is proposed to extend the SoftMax loss with multiple centers for each class, equivalent to a smoothed triplet loss where each class has a single center.

Supervised Contrastive Learning

A novel training methodology that consistently outperforms cross entropy on supervised learning tasks across different architectures and data augmentations is proposed, and the batch contrastive loss is modified, which has recently been shown to be very effective at learning powerful representations in the self-supervised setting.

Regularizing Neural Networks by Penalizing Confident Output Distributions

It is found that both label smoothing and the confidence penalty improve state-of-the-art models across benchmarks without modifying existing hyperparameters, suggesting the wide applicability of these regularizers.

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

A Sentiment Treebank that includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality, and introduces the Recursive Neural Tensor Network.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

SentEval: An Evaluation Toolkit for Universal Sentence Representations

We introduce SentEval, a toolkit for evaluating the quality of universal sentence representations. SentEval encompasses a variety of tasks, including binary and multi-class classification, natural