• Corpus ID: 248506228

AcTune: Uncertainty-aware Active Self-Training for Semi-Supervised Active Learning with Pretrained Language Models

@inproceedings{Yu2021AcTuneUA,
  title={AcTune: Uncertainty-aware Active Self-Training for Semi-Supervised Active Learning with Pretrained Language Models},
  author={Yue Yu and Lingkai Kong and Jieyu Zhang and Rongzhi Zhang and Chao Zhang},
  year={2021}
}
While pre-trained language model (PLM) finetuning has achieved strong performance in many NLP tasks, the fine-tuning stage can be still demanding in labeled data. Recent works have resorted to active fine-tuning to improve the label efficiency of PLM fine-tuning, but none of them investigate the potential of unlabeled data. We propose A C T UNE , a new framework that leverages unlabeled data to improve the label efficiency of active PLM fine-tuning. A C T UNE switches between data annotation and model… 
1 Citations

Figures and Tables from this paper

Complex QA and language models hybrid architectures, Survey

This paper identifies key elements augmenting LLM to solve complex questions or problems, using elements such as: hybrid LLM architectures, active human reinforcement learning supervised with AI, prompting adaptation, neuro-symbolic and structured knowledge grounding, program synthesis, iterated decomposition and others.

References

SHOWING 1-10 OF 62 REFERENCES

Learning Question Classifiers

A hierarchical classifier is learned that is guided by a layered semantic hierarchy of answer types, and eventually classifies questions into fine-grained classes.

WRENCH: A Comprehensive Benchmark for Weak Supervision

A benchmark platform, WRENCH, for thorough and standardized evaluation of WS approaches, consisting of 22 varied real-world datasets for classification and sequence tagging; a range of real, synthetic, and procedurally-generated weak supervision sources; and a modular, extensible framework for WS evaluation, including implementations for popular WS methods.

Bayesian Active Learning with Pretrained Language Models

BALM; Bayesian Active Learning with pretrained language Models provides substantial data efficiency improvements compared to various combinations of acquisition functions, models and fine-tuning methods proposed in recent AL literature.

RoBERTa: A Robustly Optimized BERT Pretraining Approach

It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.

Character-level Convolutional Networks for Text Classification

This article constructed several large-scale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results in text classification.

Active Learning by Acquiring Contrastive Examples

This work proposes an acquisition function that opts for selecting contrastive examples, i.e. data points that are similar in the model feature space and yet the model outputs maximally different predictive likelihoods, and shows that CAL achieves a better trade-off between uncertainty and diversity compared to other strategies.

In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label Selection Framework for Semi-Supervised Learning

This work proposes an uncertainty-aware pseudo-label selection (UPS) framework which improves pseudo labeling accuracy by drastically reducing the amount of noise encountered in the training process and generalizes the pseudo- labeling process, allowing for the creation of negative pseudo-labels.

Cold-start Active Learning through Self-Supervised Language Modeling

With BERT, a simple strategy based on the masked language modeling loss that minimizes labeling costs for text classification is developed and reaches higher accuracy within less sampling iterations and computation time.

Rethinking deep active learning: Using unlabeled data at model training

The use of unlabeled data during model training brings a spectacular accuracy improvement in image classification, compared to the differences between acquisition strategies, and is orthogonal to acquisition strategies by using more data, much like ensemble methods use more models.

Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds

This work designs a new algorithm for batch active learning with deep neural network models that samples groups of points that are disparate and high-magnitude when represented in a hallucinated gradient space, and shows that while other approaches sometimes succeed for particular batch sizes or architectures, BADGE consistently performs as well or better, making it a versatile option for practical active learning problems.
...