• Corpus ID: 226237288

Data-to-Text Generation with Iterative Text Editing

  title={Data-to-Text Generation with Iterative Text Editing},
  author={Zdeněk Kasner and Ondrej Dusek},
We present a novel approach to data-to-text generation based on iterative text editing. Our approach maximizes the completeness and semantic accuracy of the output text while leveraging the abilities of recent pre-trained models for text editing (LaserTagger) and language modeling (GPT-2) to improve the text fluency. To this end, we first transform data items to text using trivial templates, and then we iteratively improve the resulting text by a neural model trained for the sentence fusion… 

Figures and Tables from this paper

Innovations in Neural Data-to-text Generation
This survey draws boundaries separating DTG from the rest of the natural language generation (NLG) landscape, encompassing an up-to-date synthesis of the literature, and highlighting the stages of technological adoption from within and outside the greater NLG umbrella.
Text Generation with Text-Editing Models
This tutorial provides a comprehensive overview of the text-edit based models and current state-of-the-art approaches analyzing their pros and cons.
Neural Pipeline for Zero-Shot Data-to-Text Generation
This work proposes to generate text by transforming single-item descriptions with a sequence of modules trained on general-domain text-based operations: ordering, aggregation, and paragraph compression on a synthetic corpus WikiFluent which is built from English Wikipedia.
Controlling hallucinations at word level in data-to-text generation
A finer-grained approach to hallucinations, arguing that hallucinations should rather be treated at the word level, is proposed, which is able to reduce and control hallucinations, while keeping fluency and coherence in generated texts.
Domain Adaptation for Natural Language Generation Thesis Proposal
This work aims to develop NLG systems that can perform well in the domains with the lack of available training data, using large pretrained neural language models to facilitate domain adaptation ofNLG systems.
A sequence to sequence transformer data logic experiment
This paper presents experiments to evaluate if a sequence to sequence transformer can be constrained into generating the specifics of a financial report, and more generally whether it can trustfully reproduce a semantic logic, and to what extent.
Getting to Production with Few-shot Natural Language Generation Models
This paper introduces a system consisting of iterative self-training and an extensible mini-template framework that textualizes the structured input data into semi-natural text to fully take advantage of pre-trained language models to enable few-shot Natural Language Generation.
K Pearson, K. M. and Farrell, M. J. Developing a curriculum for the blind retarded: A survey of the facilities for the mentally retarded blind in the United States.


FELIX: Flexible Text Editing Through Tagging and Insertion
FELIX is a flexible text-editing approach for generation, designed to derive maximum benefit from the ideas of decoding with bi-directional contexts and self-supervised pretraining, which is efficient in low-resource settings and fast at inference time, while being capable of modeling flexible input-output transformations.
Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity
This work presents DataTuner, a neural, end-to-end data- to-text generation system that makes minimal assumptions about the data representation and target domain, combining a fine-tuned language model with a semantic fidelity classifier.
Evaluating Semantic Accuracy of Data-to-Text Generation with Natural Language Inference
This work proposes a new metric for evaluating the semantic accuracy of D2T generation based on a neural model pretrained for natural language inference (NLI), which uses the NLI model to check textual entailment between the input data and the output text in both directions.
Text-to-Text Pre-Training for Data-to-Text Tasks
It is indicated that text-to-text pre-training in the form of T5 enables simple, end- to-end transformer based models to outperform pipelined neural architectures tailored for data-to/text generation, as well as alternatives such as BERT and GPT-2.
Step-by-Step: Separating Planning from Realization in Neural Data-to-Text Generation
The results demonstrate that decoupling text planning from neural realization indeed improves the system’s reliability and adequacy while maintaining fluent output, and improvements both in BLEU scores and in manual evaluations are observed.
Encode, Tag, Realize: High-Precision Text Editing
LaserTagger is proposed - a sequence tagging approach that casts text generation as a text editing task, and it is shown that at inference time tagging can be more than two orders of magnitude faster than comparable seq2seq models, making it more attractive for running in a live environment.
Neural data-to-text generation: A comparison between pipeline and end-to-end architectures
Automatic and human evaluations together with a qualitative analysis suggest that having explicit intermediate steps in the generation process results in better texts than the ones generated by end-to-end approaches.
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.
Improving Quality and Efficiency in Plan-based Neural Data-to-text Generation
A trainable neural planning component is introduced that can generate effective plans several orders of magnitude faster than the original planner and a verification-by-reranking stage that substantially improves the faithfulness of the resulting texts is introduced.
Challenges in Data-to-Document Generation
A new, large-scale corpus of data records paired with descriptive documents is introduced, a series of extractive evaluation methods for analyzing performance are proposed, and baseline results are obtained using current neural generation methods.