Learning Data Augmentation Schedules for Natural Language Processing

@article{Chopard2021LearningDA,
  title={Learning Data Augmentation Schedules for Natural Language Processing},
  author={Daphn{\'e} Chopard and Matthias Sebastian Treder and Irena Spasi{\'c}},
  journal={Proceedings of the Second Workshop on Insights from Negative Results in NLP},
  year={2021}
}
Despite its proven efficiency in other fields, data augmentation is less popular in the context of natural language processing (NLP) due to its complexity and limited results. A recent study (Longpre et al., 2020) showed for example that task-agnostic data augmentations fail to consistently boost the performance of pretrained transformers even in low data regimes. In this paper, we investigate whether data-driven augmentation scheduling and the integration of a wider set of transformations can… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 52 REFERENCES

SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation

An extremely simple data augmentation strategy for NMT: randomly replacing words in both the source sentence and the target sentence with other random words from their corresponding vocabularies is proposed.

Text Data Augmentation for Deep Learning

The major motifs of Data Augmentation are summarized into strengthening local decision boundaries, brute force training, causality and counterfactual examples, and the distinction between meaning and form.

A Survey of Data Augmentation Approaches for NLP

This paper introduces and motivate data augmentation for NLP, and then discusses major methodologically representative approaches, and highlights techniques that are used for popular NLP applications and tasks.

Data Augmentation by Data Noising for Open-vocabulary Slots in Spoken Language Understanding

Data noising is proposed, which reflects the characteristics of the ‘open-vocabulary’ slots, for data augmentation in Spoken Language Understanding, which does not require additional memory and it can be applied simultaneously with the training process of the model.

How Effective is Task-Agnostic Data Augmentation for Pretrained Transformers?

A negative result is observed, finding that techniques which previously reported strong improvements for non-pretrained models fail to consistently improve performance for pretrained transformers, even when training data is limited.

Improving short text classification through global augmentation methods

The effect of different approaches to text augmentation is studied to provide insights for practitioners and researchers on making choices for augmentation for classification use cases and the use of \emph{mixup} further improves performance of all text based augmentations and reduces the effects of overfitting on a tested deep learning model.

Data Augmentation for Low-Resource Neural Machine Translation

A novel data augmentation approach that targets low-frequency words by generating new sentence pairs containing rare words in new, synthetically created contexts that improves translation quality on simulated low-resource settings.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Deep Unordered Composition Rivals Syntactic Methods for Text Classification

This work presents a simple deep neural network that competes with and, in some cases, outperforms such models on sentiment analysis and factoid question answering tasks while taking only a fraction of the training time.

Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations

This work retrofit a language model with a label-conditional architecture, which allows the model to augment sentences without breaking the label-compatibility and improves classifiers based on the convolutional or recurrent neural networks.
...