• Corpus ID: 222341644

Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical Supervision from Extractive Summaries

  title={Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical Supervision from Extractive Summaries},
  author={Xiaofei Sun and Chun Fan and Zijun Sun and Yuxian Meng and Fei Wu and Jiwei Li},
Long-text generation remains a challenge. The difficulty of generating coherent long texts lies in the fact that existing models overwhelmingly focus on the tasks of local word prediction, and cannot make high level plans on what to generate or capture the high-level discourse dependencies between chunks of texts. Inspired by how humans write, where a list of bullet points or a catalog is first outlined, and then each bullet point is expanded to form the whole article, we propose {\it SOE}, a… 

Figures and Tables from this paper

Plot Writing From Pre-Trained Language Models
This work proposes generating story plots using off-the-shelf PLMs while maintaining the bene-fit of content planning to generate cohesive and contentful stories.
DYLE: Dynamic Latent Extraction for Abstractive Long-Input Summarization
DYLE jointly trains an extractor and a generator and treats the extracted text snippets as the latent variable, allowing dynamic snippet-level attention weights during decoding, and shows that the proposed dynamic weights provide interpretability of the generation process.


Sentence-Level Content Planning and Style Specification for Neural Text Generation
This work presents an end-to-end trained two-step generation model, where a sentence-level content planner first decides on the keyphrases to cover as well as a desired language style, followed by a surface realization decoder that generates relevant and coherent text.
Progressive Generation of Long Text
This work proposes a simple but effective method of generating text in a progressive manner, inspired by generating images from low to high resolution, and significantly improves upon the fine-tuned GPT-2 in terms of domain-specific quality and sample efficiency.
Order-Planning Neural Text Generation From Structured Data
This paper proposes an order-planning text generation model to capture the relationship between different fields and use such relationship to make the generated text more fluent and smooth.
Bottom-Up Abstractive Summarization
This work explores the use of data-efficient content selectors to over-determine phrases in a source document that should be part of the summary, and shows that this approach improves the ability to compress text, while still generating fluent summaries.
Extractive Summarization as Text Matching
This paper forms the extractive summarization task as a semantic text matching problem, in which a source document and candidate summaries will be matched in a semantic space to create a semantic matching framework.
Towards Generating Long and Coherent Text with Multi-Level Latent Variable Models
Extensive experimental results demonstrate that the proposed multi-level VAE model produces more coherent and less repetitive long text compared to baselines as well as can mitigate the posterior-collapse issue.
Long Text Generation via Adversarial Training with Leaked Information
The discriminative net is allowed to leak its own high-level extracted features to the generative net to further help the guidance, and without any supervision, LeakGAN would be able to implicitly learn sentence structures only through the interaction between Manager and Worker.
A Hierarchical Neural Autoencoder for Paragraphs and Documents
This paper introduces an LSTM model that hierarchically builds an embedding for a paragraph from embeddings for sentences and words, then decodes this embedding to reconstruct the original paragraph and evaluates the reconstructed paragraph using standard metrics to show that neural models are able to encode texts in a way that preserve syntactic, semantic, and discourse coherence.
Step-by-Step: Separating Planning from Realization in Neural Data-to-Text Generation
The results demonstrate that decoupling text planning from neural realization indeed improves the system’s reliability and adequacy while maintaining fluent output, and improvements both in BLEU scores and in manual evaluations are observed.
Data-to-Text Generation with Content Selection and Planning
This work presents a neural network architecture which incorporates content selection and planning without sacrificing end-to-end training and shows that this model outperforms strong baselines improving the state-of-the-art on the recently released RotoWire dataset.