• Corpus ID: 237513677

Self-Training with Differentiable Teacher

  title={Self-Training with Differentiable Teacher},
  author={Simiao Zuo and Yue Yu and Chen Liang and Haoming Jiang and Siawpeng Er and Chao Zhang and Tuo Zhao and Hongyuan Zha},
Self-training achieves enormous success in various semi-supervised and weakly-supervised learning tasks. The method can be interpreted as a teacher-student framework, where the teacher generates pseudo-labels, and the student makes predictions. The two models are updated alternatingly. However, such a straightforward alternating update rule leads to training instability. This is because a small change in the teacher may result in a sig-nificant change in the student. To address this issue, we… 
ATM: An Uncertainty-aware Active Self-training Framework for Label-efficient Text Classification
ATM is a new framework that leverages self-training to exploit unlabeled data and is agnostic to the specific AL algorithm, serving as a plug-in module to improve existing AL methods.
AcTune: Uncertainty-aware Active Self-Training for Semi-Supervised Active Learning with Pretrained Language Models
Experiments show that A C T UNE outperforms the strongest active learning and self-training baselines and improves the label efficiency of PLM finetuning by 56.2% on average.


Graph Random Neural Networks for Semi-Supervised Learning on Graphs
In GRAND, a simple yet effective framework for semi-supervised learning on graphs that first design a random propagation strategy to perform graph data augmentation, then leverages consistency regularization to optimize the prediction consistency of unlabeled nodes across different data augmentations.
Weakly-Supervised Neural Text Classification
This paper proposes a weakly-supervised method that addresses the lack of training data in neural text classification and achieves inspiring performance without requiring excessive training data and outperforms baseline methods significantly.
RoBERTa: A Robustly Optimized BERT Pretraining Approach
It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.
BioCreative V CDR task corpus: a resource for chemical disease relation extraction
The BC5CDR corpus was successfully used for the BioCreative V challenge tasks and should serve as a valuable resource for the text-mining research community.
Semi-Supervised Classification with Graph Convolutional Networks
A scalable approach for semi-supervised learning on graph-structured data that is based on an efficient variant of convolutional neural networks which operate directly on graphs which outperforms related methods by a significant margin.
Adam: A Method for Stochastic Optimization
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Convolutional Neural Networks for Sentence Classification
The CNN models discussed herein improve upon the state of the art on 4 out of 7 tasks, which include sentiment analysis and question classification, and are proposed to allow for the use of both task-specific and static vectors.
Learning Word Vectors for Sentiment Analysis
This work presents a model that uses a mix of unsupervised and supervised techniques to learn word vectors capturing semantic term--document information as well as rich sentiment content, and finds it out-performs several previously introduced methods for sentiment classification.
Uncertainty-aware Self-training for Few-shot Text Classification
This work proposes an approach to improve self-training by incorporating uncertainty estimates of the underlying neural network leveraging recent advances in Bayesian deep learning and proposes acquisition functions to select instances from the unlabeled pool leveraging Monte Carlo (MC) Dropout and learning mechanism leveraging model confidence for self- training.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.