A Comprehensive Survey of Natural Language Generation Advances from the Perspective of Digital Deception

  title={A Comprehensive Survey of Natural Language Generation Advances from the Perspective of Digital Deception},
  author={Keenan I. Jones and Enes Altuncu and Virginia N. L. Franqueira and Yi-Chia Wang and Shujun Li},
In recent years there has been substantial growth in the capabilities of systems designed to generate text that mimics the fluency and coherence of human language. From this, there has been considerable research aimed at examining the potential uses of these natural language generators (NLG) towards a wide number of tasks. The increasing capabilities of powerful text generators to mimic human writing convincingly raises the potential for deception and other forms of dangerous misuse. As these… 



Retrieving and Reading: A Comprehensive Survey on Open-domain Question Answering

This work reviews the latest research trends in OpenQA, with particular attention to systems that incorporate neural MRC techniques, and revisiting the origin and development of Open QA systems.

CodE Alltag 2.0 — A Pseudonymized German-Language Email Corpus

This work investigates the automatic recognition of privacy-sensitive stretches of text in UGC and provides an algorithmic solution for the protection of personal data via pseudonymization and evaluates several de-identification procedures and systems on two hitherto non-anonymized German-language email corpora.

RtGender: A Corpus for Studying Differential Responses to Gender

A multi-genre corpus of more than 25M comments from five socially and topically diverse sources tagged for the gender of the addressee enables studying socially important questions like gender bias, and has potential uses for downstream applications such as dialogue systems, gender detection or obfuscation, and debiasing language generation.

Evaluating prose style transfer with the Bible

This work identifies a high-quality source of aligned, stylistically distinct text in different versions of the Bible, and provides a standardized split, into training, development and testing data, of the public domain versions in their corpus.

A Monolingual Tree-based Translation Model for Sentence Simplification

A Tree-based Simplification Model (TSM) is proposed, which, to the knowledge, is the first statistical simplification model covering splitting, dropping, reordering and substitution integrally.

Human Evaluation of Creative NLG Systems: An Interdisciplinary Survey on Recent Papers

These guidelines for future evaluation include clearly defining the goal of the generative system, asking questions as concrete as possible, testing the evaluation setup, using multiple different evaluation setups, reporting the entire evaluation process and potential biases clearly, and finally analyzing the evaluation results in a more profound way than merely reporting the most typical statistics.

A Survey on Data Augmentation for Text Classification

This survey is concerned with data augmentation methods for textual classification and aims to provide a concise and comprehensive overview for researchers and practitioners.

Conversational Agents in Software Engineering: Survey, Taxonomy and Challenges

A holistic taxonomy of the different dimensions involved in the conversational agents’ field is proposed, which is expected to help researchers and to lay the groundwork for future research in the field of natural language interfaces.

A Review of Human Evaluation for Style Transfer

It is found that protocols for human evaluations are often underspecified and not standardized, which hampers the reproducibility of research in this field and progress toward better human and automatic evaluation methods.

Conversational question answering: a survey

There has been a trend shift from single-turn to multi-turn QA which empowers the field of Conversational AI from different perspectives, and this survey is intended to provide an epitome for the research community with the hope of laying a strong foundation for theField of CQA.