Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference

@inproceedings{Schick2021ExploitingCF,
  title={Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference},
  author={Timo Schick and Hinrich Sch{\"u}tze},
  booktitle={EACL},
  year={2021}
}
Some NLP tasks can be solved in a fully unsupervised fashion by providing a pretrained language model with “task descriptions” in natural language (e.g., Radford et al., 2019). While this approach underperforms its supervised counterpart, we show in this work that the two ideas can be combined: We introduce Pattern-Exploiting Training (PET), a semi-supervised training procedure that reformulates input examples as cloze-style phrases to help language models understand a given task. These phrases… 

Figures and Tables from this paper

Contrastive Learning for Prompt-Based Few-Shot Language Learners
TLDR
A supervised contrastive framework that clusters inputs from the same class under different augmented "views" and repel the ones from different classes for better general-ity of models trained with only limited examples is presented.
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning
TLDR
A new parameter-efficient fine-tuning method called (IA) 3 that scales activations by learned vectors, attaining stronger performance while only introducing a relatively tiny amount of new parameters.
Instance-aware Prompt Learning for Language Understanding and Generation
TLDR
This paper proposes an instance-aware prompt learning method that learns a different prompt for each instance that achieves the state-of-the-art on the SuperGLUE few-shot learning benchmark.
All NLP Tasks Are Generation Tasks: A General Pretraining Framework
TLDR
This architecture has three major benefits: it performs well on classification, unconditional generation, and conditional generation tasks with one single pretrained model; it outperforms BERT-like models on classification due to improved pretrain-finetune consistency; and it naturally handles variable-length blank filling which is crucial for many downstream tasks.
AdaPrompt: Adaptive Model Training for Prompt-based NLP
TLDR
AdaPrompt is proposed, adaptively retrieving external data for continual pretraining of PLMs by making use of both task and prompt characteristics, and makes use of knowledge in Natural Language Inference models for deriving adaptive verbalizers.
Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models
TLDR
This work shows that finetuning LMs in the few-shot setting can considerably reduce the need for prompt engineering, and recommends finetuned LMs for few- shot learning as it is more accurate, robust to different prompts, and can be made nearly as efficient as using frozen LMs.
Eliciting Knowledge from Pretrained Language Models for Prototypical Prompt Verbalizer
TLDR
This paper focuses on eliciting knowledge from pretrained language models and proposes a prototypical prompt verbalizer for prompt-tuning, which optimizes models by contrastive learning.
FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural Language Understanding
TLDR
This work introduces an evaluation framework that improves previous evaluation procedures in three key aspects, i.e., test performance, dev-test correlation, and stability, and re-evaluate several state-of-the-art few-shot methods for NLU tasks.
GLM: General Language Model Pretraining with Autoregressive Blank Infilling
TLDR
GLM improves blank filling pretraining by adding 2D positional encodings and allowing an arbitrary order to predict spans, which results in performance gains over BERT and T5 on NLU tasks, and achieves the best performance from a single pretrained model with 1.25× parameters of BERT Large.
Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification
TLDR
This work focuses on incorporating external knowledge into the verbalizer, forming a knowledgeable prompt Tuning (KPT), to improve and stabilize prompttuning.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 64 REFERENCES
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
TLDR
The Multi-Genre Natural Language Inference corpus is introduced, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding and shows that it represents a substantially more difficult task than does the Stanford NLI corpus.
Character-level Convolutional Networks for Text Classification
TLDR
This article constructed several large-scale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results in text classification.
How Can We Know What Language Models Know?
TLDR
This paper proposes mining-based and paraphrasing-based methods to automatically generate high-quality and diverse prompts, as well as ensemble methods to combine answers from different prompts to provide a tighter lower bound on what LMs know.
2019. BERT: Pre-training
  • 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
RoBERTa: A Robustly Optimized BERT Pretraining Approach
TLDR
It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
TLDR
This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.
How can we know what language models know? Transactions of the Association for Computational Linguistics, 8:423–438
  • 2020
Inducing Relational Knowledge from BERT
TLDR
This work proposes a methodology for distilling relational knowledge from a pre-trained language model that fine-tune a language model to predict whether a given word pair is likely to be an instance of some relation, when given an instantiated template for that relation as input.
MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification
TLDR
By mixing labeled, unlabeled and augmented data, MixText significantly outperformed current pre-trained and fined-tuned models and other state-of-the-art semi-supervised learning methods on several text classification benchmarks.
...
1
2
3
4
5
...