• Corpus ID: 215416390

Neural CRF Sentence Alignment Model for Text Simplification

  title={Neural CRF Sentence Alignment Model for Text Simplification},
  author={Chao Jiang and Mounica Maddela and Wuwei Lan and Y. Zhong and Wei Xu},
The success of a text simplification system heavily depends on the quality and quantity of complex-simple sentence pairs in the training corpus, which are extracted by aligning sentences between parallel articles. To evaluate and improve sentence alignment quality, we create two manually annotated sentencealigned datasets from two commonly used text simplification corpora. We also propose a novel neural CRF alignment model which not only leverages the sequential nature of sentences in parallel… 
Controllable Text Simplification with Explicit Paraphrasing
A novel hybrid approach is proposed that leverages linguistically-motivated rules for splitting and deletion, and couples them with a neural paraphrasing model to produce varied rewriting styles and establishes a new state-of-the-art for the task.


CATS: A Tool for Customized Alignment of Text Simplification Corpora
This paper presents a freely available, language-independent tool for sentence alignment from parallel/comparable TS resources (document-aligned resources), which additionally offers the possibility for filtering sentences depending on the level of their semantic overlap.
Sentence Simplification by Monolingual Machine Translation
By relatively careful phrase-based paraphrasing this model achieves similar simplification results to state-of-the-art systems, while generating better formed output, and argues that text readability metrics such as the Flesch-Kincaid grade level should be used with caution when evaluating the output of simplification systems.
Extracting Parallel Sentences from Comparable Corpora using Document Level Alignment
This work advances the state of the art in parallel sentence extraction by modeling the document level alignment, motivated by the observation that parallel sentence pairs are often found in close proximity.
EditNTS: An Neural Programmer-Interpreter Model for Sentence Simplification through Explicit Editing
This work presents the first sentence simplification model that learns explicit edit operations (ADD, DELETE, and KEEP) via a neural programmer-interpreter approach, and is judged by humans to produce overall better and simpler output sentences.
Complexity-Weighted Loss and Diverse Reranking for Sentence Simplification
This work incorporates content word complexities, as predicted with a leveled word complexity model, into the loss function during training and generates a large set of diverse candidate simplifications at test time, and rerank these to promote fluency, adequacy, and simplicity.
Integrating Transformer and Paraphrase Rules for Sentence Simplification
A novel model based on a multi-layer and multi-head attention architecture and two innovative approaches to integrate the Simple PPDB (A Paraphrase Database for Simplification), an external paraphrase knowledge base for simplification that covers a wide range of real-world simplification rules.
Aligning Sentences from Standard Wikipedia to Simple Wikipedia
This work improves monolingual sentence alignment for text simplification, specifically for text in standard and simple Wikipedia by using a greedy search over the document and a word-level semantic similarity score based on Wiktionary that also accounts for structural similarity through syntactic dependencies.
Optimizing Statistical Machine Translation for Text Simplification
This work is the first to design automatic metrics that are effective for tuning and evaluating simplification systems, which will facilitate iterative development for this task.
Sentence Alignment for Monolingual Comparable Corpora
This work addresses the problem of sentence alignment for monolingual corpora by incorporating context into the search for an optimal alignment in two complementary ways: learning rules for matching paragraphs using topic structure and refining the matching through local alignment to find good sentence pairs.
Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs
A way to automatically identify operations in a parallel corpus and introduce a sequence-labeling approach based on these annotations is devised, which provides insights on the types of transformations that different approaches can model.