Generating Syntactically Controlled Paraphrases without Using Annotated Parallel Pairs

@article{Huang2021GeneratingSC,
  title={Generating Syntactically Controlled Paraphrases without Using Annotated Parallel Pairs},
  author={Kuan-Hao Huang and Kai-Wei Chang},
  journal={ArXiv},
  year={2021},
  volume={abs/2101.10579}
}
Paraphrase generation plays an essential role in natural language process (NLP), and it has many downstream applications. However, training supervised paraphrase models requires many annotated paraphrase pairs, which are usually costly to obtain. On the other hand, the paraphrases generated by existing unsupervised approaches are usually syntactically similar to the source sentences and are limited in diversity. In this paper, we demonstrate that it is possible to generate syntactically various… 

Figures and Tables from this paper

Learning to Selectively Learn for Weakly-supervised Paraphrase Generation
TLDR
This work proposes a novel approach to generate high-quality paraphrases with data of weak supervision by obtaining abundant weakly-labeled parallel sentences via retrieval-based pseudo paraphrase expansion and developing a meta-learning framework to progressively select valuable samples for fine-tuning a pre-trained language model BART on the sentential paraphrasing task.
Idiomatic Expression Paraphrasing without Strong Supervision
TLDR
This paper proposes an unsupervised approach to ISP, which leverages an IE’s contextual information and definition and does not require a parallel sentence training set, and proposes a weakly supervised approach using back-translation to jointly perform paraphrasing and generation of sentences with IEs to enlarge the small-scale parallel sentences training dataset.
Disentangling Semantics and Syntax in Sentence Embeddings with Pre-trained Language Models
TLDR
ParaBART is a semantic sentence embedding model that learns to disentangle semantics and syntax in sentence embeddings obtained by pre-trained language models, and can effectively remove syntactic information from semantic sentenceembeddings, leading to better robustness against syntactic variation on downstream semantic tasks.
Revisiting the Evaluation Metrics of Paraphrase Generation
TLDR
This paper proposes BBScore, a reference-free metric that can reflect the generated paraphrase’s quality, and presents two conclusions that disobey conventional wisdom in paraphrasing generation.
Principled Paraphrase Generation with Parallel Corpora
TLDR
This paper formalizes the implicit similarity function induced by round-trip Machine Translation, and designs an alternative similarity metric that mitigates this issue by requiring the entire translation distribution to match, and implements a relaxation of it through the Information Bottleneck method.
Multi-task Learning for Paraphrase Generation With Keyword and Part-of-Speech Reconstruction
TLDR
A novel two-stage model, PGKPR, is proposed for paraphrase generation with keyword and part-of-speech reconstruction to capture simultaneously the possible keywords of a source sentence and the relations between them to facilitate the rewriting.
Towards Unsupervised Content Disentanglement in Sentence Representations via Syntactic Roles
TLDR
This work measures the interaction between latent variables and realizations of syntactic roles and shows that it is possible to obtain representations of sentences where different syntactic role correspond to clearly identified different latent variables, a first step towards unsupervised controllable content generation.
AESOP: Paraphrase Generation with Adaptive Syntactic Control
TLDR
The model, AESOP, leverages a pretrained language model and adds deliberately chosen syntactical control via a retrieval-based selection module to generate fluent paraphrases that achieve state-of-the-art performances on semantic preservation and syntactic conformation on two benchmark datasets.
Novelty Controlled Paraphrase Generation with Retrieval Augmented Conditional Prompt Tuning
TLDR
The effectiveness of the proposed approaches for retaining the semantic content of the original text while inducing lexical novelty in the generation of paraphrase generation is demonstrated.
Exploiting Inductive Bias in Transformers for Unsupervised Disentanglement of Syntax and Semantics with VAEs
TLDR
A generative model for text generation, which exhibits disentangled latent representations of syntax and semantics, which relies solely on the inductive bias found in attention-based archi-tectures such as Transformers.
...
...

References

SHOWING 1-10 OF 66 REFERENCES
Neural Syntactic Preordering for Controlled Paraphrase Generation
TLDR
This work uses syntactic transformations to softly “reorder” the source sentence and guide the neural paraphrasing model, which retains the quality of the baseline approaches while giving a substantial increase in the diversity of the generated paraphrases.
Adversarial Example Generation with Syntactically Controlled Paraphrase Networks
TLDR
A combination of automated and human evaluations show that SCPNs generate paraphrases that follow their target specifications without decreasing paraphrase quality when compared to baseline (uncontrolled) paraphrase systems.
A Deep Generative Framework for Paraphrase Generation
TLDR
Quantitative evaluation of the proposed method on a benchmark paraphrase dataset demonstrates its efficacy, and its performance improvement over the state-of-the-art methods by a significant margin, whereas qualitative human evaluation indicate that the generated paraphrases are well-formed, grammatically correct, and are relevant to the input sentence.
Syntax-Guided Controlled Generation of Paraphrases
TLDR
Syntax Guided Controlled Paraphraser (SGCP), an end-to-end framework for syntactic paraphrase generation, is proposed and it is found that Sgcp can generate syntax-conforming sentences while not compromising on relevance.
Paraphrasing Revisited with Neural Machine Translation
TLDR
This paper revisit bilingual pivoting in the context of neural machine translation and presents a paraphrasing model based purely on neural networks, which represents paraphrases in a continuous space, estimates the degree of semantic relatedness between text segments of arbitrary length, and generates candidate paraphrase for any source input.
Paraphrase Generation with Deep Reinforcement Learning
TLDR
Experimental results on two datasets demonstrate the proposed models can produce more accurate paraphrases and outperform the state-of-the-art methods in paraphrase generation in both automatic evaluation and human evaluation.
Unsupervised Paraphrasing without Translation
TLDR
This work proposes to learn paraphrasing models only from a monolingual corpus, and proposes a residual variant of vector-quantized variational auto-encoder, which is shown to outperform unsupervised translation in all contexts.
Joint Copying and Restricted Generation for Paraphrase
TLDR
A novel Seq2Seq model to fuse a copying decoder and a restricted generative decoder that outperforms the state-of-the-art approaches in terms of both informativeness and language quality.
ParaBank: Monolingual Bitext Generation and Sentential Paraphrasing via Lexically-constrained Neural Machine Translation
TLDR
ParaBank is presented, a large-scale English paraphrase dataset that surpasses prior work in both quantity and quality and is used to train a monolingual NMT model with the same support for lexically-constrained decoding for sentence rewriting tasks.
Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources
TLDR
Investigation of unsupervised techniques for acquiring monolingual sentence-level paraphrases from a corpus of temporally and topically clustered news articles collected from thousands of web-based news sources shows that edit distance data is cleaner and more easily-aligned than the heuristic data.
...
...