Variational Pretraining for Semi-supervised Text Classification

@inproceedings{Gururangan2019VariationalPF,
  title={Variational Pretraining for Semi-supervised Text Classification},
  author={Suchin Gururangan and Tam Dang and D. Card and Noah A. Smith},
  booktitle={ACL},
  year={2019}
}
We introduce VAMPIRE, a lightweight pretraining framework for effective text classification when data and computing resources are limited. We pretrain a unigram document model as a variational autoencoder on in-domain, unlabeled data and use its internal states as features in a downstream classifier. Empirically, we show the relative strength of VAMPIRE against computationally expensive contextual embeddings and other popular semi-supervised baselines under low resource settings. We also find… Expand
Challenging the Semi-Supervised VAE Framework for Text Classification
TLDR
This paper questions the adequacy of the standard design of sequence SSVAEs for the task of text classification as they exhibit two sources of overcomplexity, and provides simplifications to preserve their theoretical soundness and provide a better flow of information into the latent variables. Expand
MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification
TLDR
By mixing labeled, unlabeled and augmented data, MixText significantly outperformed current pre-trained and fined-tuned models and other state-of-the-art semi-supervised learning methods on several text classification benchmarks. Expand
AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing
TLDR
This comprehensive survey paper explains various core concepts like pretraining, Pretraining methods, pretraining tasks, embeddings and downstream adaptation methods, presents a new taxonomy of T-PTLMs and gives brief overview of various benchmarks including both intrinsic and extrinsic. Expand
Attending to Long-Distance Document Context for Sequence Labeling
TLDR
This work's model learns to attend to multiple mentions of the same word type in generating a representation for each token in context, extending that work to learning representations that can be incorporated into modern neural models. Expand
Interpretable Operational Risk Classification with Semi-Supervised Variational Autoencoder
TLDR
The results demonstrate that the semi-supervised text classification framework presented can better utilize unlabeled data and learn visually interpretable document representations and also outperforms other baseline methods on operational risk classification. Expand
Entropy optimized semi-supervised decomposed vector-quantized variational autoencoder model based on transfer learning for multiclass text classification and generation
TLDR
Experimental results indicate that the proposed semisupervised discrete latent variable model for multi-class text classification and text generation has surpassed the state-of-the-art models remarkably. Expand
Variational Information Bottleneck for Effective Low-Resource Fine-Tuning
TLDR
This work proposes to use Variational Information Bottleneck (VIB) to suppress irrelevant features when fine-tuning on low-resource target tasks, and shows that the method successfully reduces overfitting. Expand
SALNet: Semi-supervised Few-Shot Text Classification with Attention-based Lexicon Construction
TLDR
A semi-supervised bootstrap learning framework for few-shot text classification that combines the deep learning classifier with the lexicon and tackle the labeled-data sparsity problem. Expand
Improving Variational Autoencoder for Text Modelling with Timestep-Wise Regularisation
TLDR
This paper proposes a simple, generic architecture called Timestep-Wise Regularisation VAE (TWR-VAE), which can effectively avoid posterior collapse and can be applied to any RNN-based VAE models. Expand
Controlling the Interaction Between Generation and Inference in Semi-Supervised Variational Autoencoders Using Importance Weighting
TLDR
Using importance weighting and an analysis of the objective of semi-supervised VAEs, it is shown that they use the posterior of the learned generative model to guide the inference model in learning the partially observed latent variable. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 45 REFERENCES
Universal Language Model Fine-tuning for Text Classification
TLDR
This work proposes Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduces techniques that are key for fine- Tuning a language model. Expand
Virtual Adversarial Training for Semi-Supervised Text Classification
TLDR
This work extends adversarial and virtual adversarial training to the text domain by applying perturbations to the word embeddings in a recurrent neural network rather than to the original input itself. Expand
Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks
TLDR
The benefits of supplementary training with further training on data-rich supervised tasks, such as natural language inference, obtain additional performance improvements on the GLUE benchmark, as well as observing reduced variance across random restarts in this setting. Expand
Improved Variational Autoencoders for Text Modeling using Dilated Convolutions
TLDR
It is shown that with the right decoder, VAE can outperform LSTM language models, and perplexity gains are demonstrated on two datasets, representing the first positive experimental result on the use VAE for generative modeling of text. Expand
Variational Autoencoder for Semi-Supervised Text Classification
TLDR
Semi-supervised Sequential Variational Autoencoder (SSVAE) is proposed, which increases the capability by feeding label into its decoder RNN at each time-step, and reduces the computational complexity in training. Expand
Deep Unordered Composition Rivals Syntactic Methods for Text Classification
TLDR
This work presents a simple deep neural network that competes with and, in some cases, outperforms such models on sentiment analysis and factoid question answering tasks while taking only a fraction of the training time. Expand
Semi-supervised Sequence Learning
TLDR
Two approaches to use unlabeled data to improve Sequence Learning with recurrent networks are presented and it is found that long short term memory recurrent networks after pretrained with the two approaches become more stable to train and generalize better. Expand
Improving Language Understanding by Generative Pre-Training
TLDR
The general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, significantly improving upon the state of the art in 9 out of the 12 tasks studied. Expand
Semi-supervised Learning with Deep Generative Models
TLDR
It is shown that deep generative models and approximate Bayesian inference exploiting recent advances in variational methods can be used to provide significant improvements, making generative approaches highly competitive for semi-supervised learning. Expand
Dissecting Contextual Word Embeddings: Architecture and Representation
TLDR
There is a tradeoff between speed and accuracy, but all architectures learn high quality contextual representations that outperform word embeddings for four challenging NLP tasks, suggesting that unsupervised biLMs, independent of architecture, are learning much more about the structure of language than previously appreciated. Expand
...
1
2
3
4
5
...