Towards Document-Level Paraphrase Generation with Sentence Rewriting and Reordering

  title={Towards Document-Level Paraphrase Generation with Sentence Rewriting and Reordering},
  author={Zhe-nan Lin and Yitao Cai and Xiaojun Wan},
  booktitle={Conference on Empirical Methods in Natural Language Processing},
Paraphrase generation is an important task in natural language processing. Previous works focus on sentence-level paraphrase generation, while ignoring document-level paraphrase generation, which is a more challenging and valuable task. In this paper, we explore the task of document-level paraphrase generation for the first time and focus on the inter-sentence diversity by considering sentence rewriting and reordering. We propose CoRPG ( Co herence R elationship guided P araphrase G eneration… 

Figures and Tables from this paper

Multi-Perspective Document Revision

A novel Japanese multi-perspective document revision dataset is designed that simultaneously handles seven perspectives to improve the readability and clarity of a document.

Evade the Trap of Mediocrity: Promoting Diversity and Novelty in Text Generation via Concentrating Attention

It is proved that a novel attention regularization loss to control the sharpness of the attention distribution, which is transparent to model structures and can be easily implemented within 20 lines of python code, could be mathematically regarded as learning a Bayesian approximation of posterior attention.

Enhance Incomplete Utterance Restoration by Joint Learning Token Extraction and Text Generation

A model for incomplete utterance restoration (IUR) called JET (Joint learning token Extraction and Text generation) is introduced, which is better than the pretrained T5 and non-generative language model methods in both rich and limited training data settings.



A Deep Generative Framework for Paraphrase Generation

Quantitative evaluation of the proposed method on a benchmark paraphrase dataset demonstrates its efficacy, and its performance improvement over the state-of-the-art methods by a significant margin, whereas qualitative human evaluation indicate that the generated paraphrases are well-formed, grammatically correct, and are relevant to the input sentence.

Paraphrase Generation by Learning How to Edit from Samples

Experimental results show the superiority of the paraphrase generation method in terms of both automatic metrics, and human evaluation of relevance, grammaticality, and diversity of generated paraphrases.

Joint Copying and Restricted Generation for Paraphrase

A novel Seq2Seq model to fuse a copying decoder and a restricted generative decoder that outperforms the state-of-the-art approaches in terms of both informativeness and language quality.

Paraphrasing Revisited with Neural Machine Translation

This paper revisit bilingual pivoting in the context of neural machine translation and presents a paraphrasing model based purely on neural networks, which represents paraphrases in a continuous space, estimates the degree of semantic relatedness between text segments of arbitrary length, and generates candidate paraphrase for any source input.

Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment

This work applies multiple-sequence alignment to sentences gathered from unannotated comparable corpora: it learns a set of paraphrasing patterns represented by word lattice pairs and automatically determines how to apply these patterns to rewrite new sentences.

Neural Paraphrase Generation with Stacked Residual LSTM Networks

This work is the first to explore deep learning models for paraphrase generation with a stacked residual LSTM network, where it adds residual connections between L STM layers for efficient training of deep LSTMs.

Decomposable Neural Paraphrase Generation

Decomposable Neural Paraphrase Generator is presented, a Transformer-based model that can learn and generate paraphrases of a sentence at different levels of granularity in a disentangled way and an unsupervised domain adaptation method for paraphrase generation is developed.

ParaNMT-50M: Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations

This work uses ParaNMT-50M, a dataset of more than 50 million English-English sentential paraphrase pairs, to train paraphrastic sentence embeddings that outperform all supervised systems on every SemEval semantic textual similarity competition, in addition to showing how it can be used for paraphrase generation.

Jointly Learning to Align and Summarize for Neural Cross-Lingual Summarization

This paper designs relevant loss functions to train this framework and proposes several methods to enhance the isomorphism and cross-lingual transfer between languages and shows that the model can outperform competitive models in most cases.

Semantic Parsing via Paraphrasing

This paper presents two simple paraphrase models, an association model and a vector space model, and trains them jointly from question-answer pairs, improving state-of-the-art accuracies on two recently released question-answering datasets.