Data augmentation in natural language processing: a novel text generation approach for long and short text classifiers

@article{Bayer2022DataAI,
  title={Data augmentation in natural language processing: a novel text generation approach for long and short text classifiers},
  author={Markus Bayer and Marc-Andr{\'e} Kaufhold and Bj{\"o}rn Buchhold and Marcel Keller and J{\"o}rg Dallmeyer and Christian Reuter},
  journal={International Journal of Machine Learning and Cybernetics},
  year={2022},
  pages={1 - 16}
}
In many cases of machine learning, research suggests that the development of training data might have a higher relevance than the choice and modelling of classifiers themselves. Thus, data augmentation methods have been developed to improve classifiers by artificially created training data. In NLP, there is the challenge of establishing universal rules for text transformations which provide new linguistic patterns. In this paper, we present and evaluate a text generation method suitable to… 

Knowledge-Grounded Conversational Data Augmentation with Generative Conversational Networks

The results show that for conversations without knowledge grounding, GCN can generalize from the seed data, producing novel conversations that are less relevant but more engaging and for knowledge-grounded conversations, it can produce more knowledge-focused, fluent, and engaging conversations.

Multi-Level Fine-Tuning, Data Augmentation, and Few-Shot Learning for Specialized Cyber Threat Intelligence

This work combines three low-data regime techniques – transfer learning, data augmentation, and few-shot learning – to train a high-quality classifier from very few labelled instances.

Data Augmentation for Biomedical Factoid Question Answering

It is shown that DA can lead to very significant performance gains, even when using large pre-trained Transformers, contributing to a broader discussion of if/when DA benefits large pre -trained models.

A Survey on Data Augmentation for Text Classification

This survey is concerned with data augmentation methods for textual classification and aims to provide a concise and comprehensive overview for researchers and practitioners.