Unsupervised Data Augmentation with Naive Augmentation and without Unlabeled Data

  title={Unsupervised Data Augmentation with Naive Augmentation and without Unlabeled Data},
  author={David Lowell and Brian Howard and Zachary Chase Lipton and Byron C. Wallace},
Unsupervised Data Augmentation (UDA) is a semisupervised technique that applies a consistency loss to penalize differences between a model’s predictions on (a) observed (unlabeled) examples; and (b) corresponding ‘noised’ examples produced via data augmentation. While UDA has gained popularity for text classification, open questions linger over which design decisions are necessary and how to extend the method to sequence labeling tasks. In this paper, we re-examine UDA and demonstrate its… 

Figures and Tables from this paper

Data Augmentation Approaches in Natural Language Processing: A Survey
EICO: Improving Few-Shot Text Classification via Explicit and Implicit Consistency Regularization
By employing both explicit and implicit consistency regularization, EICO advances the performance of prompt-based few- shot text classification and achieves competitive performance compared to existing self-training few-shot learning methods.
Unsupervised Paraphrasing Consistency Training for Low Resource Named Entity Recognition
The use of paraphrasing is explored as a more principled data augmentation scheme for NER unsupervised consistency training and converted Conditional Random Field into a multi-label classification module and encourages consistency on the entity appearance between the original and paraphrased sequences.


Unsupervised Data Augmentation for Consistency Training
A new perspective on how to effectively noise unlabeled examples is presented and it is argued that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning.
SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation
An extremely simple data augmentation strategy for NMT: randomly replacing words in both the source sentence and the target sentence with other random words from their corresponding vocabularies is proposed.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Learning Noise-Invariant Representations for Robust Speech Recognition
This paper proposes invariant-representation-learning (IRL): at each training iteration, for each training example, a noisy counterpart is sampled and a penalty term is applied to coerce matched representations at each layer (above some chosen layer).
SciBERT: A Pretrained Language Model for Scientific Text
SciBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications to improve performance on downstream scientific NLP tasks and demonstrates statistically significant improvements over BERT.
Improving Neural Machine Translation Models with Monolingual Data
This work pairs monolingual training data with an automatic back-translation, and can treat it as additional parallel training data, and obtains substantial improvements on the WMT 15 task English German, and for the low-resourced IWSLT 14 task Turkish->English.
Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning
A new regularization method based on virtual adversarial loss: a new measure of local smoothness of the conditional label distribution given input that achieves state-of-the-art performance for semi-supervised learning tasks on SVHN and CIFAR-10.
Adam: A Method for Stochastic Optimization
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Energy and Policy Considerations for Deep Learning in NLP
This paper quantifies the approximate financial and environmental costs of training a variety of recently successful neural network models for NLP and proposes actionable recommendations to reduce costs and improve equity in NLP research and practice.
Understanding Back-Translation at Scale
This work broadens the understanding of back-translation and investigates a number of methods to generate synthetic source sentences, finding that in all but resource poor settings back-translations obtained via sampling or noised beam outputs are most effective.