Neural CRF Model for Sentence Alignment in Text Simplification

@inproceedings{Jiang2020NeuralCM,
  title={Neural CRF Model for Sentence Alignment in Text Simplification},
  author={Chao Jiang and Mounica Maddela and Wuwei Lan and Y. Zhong and Wei Xu},
  booktitle={ACL},
  year={2020}
}
The success of a text simplification system heavily depends on the quality and quantity of complex-simple sentence pairs in the training corpus, which are extracted by aligning sentences between parallel articles. To evaluate and improve sentence alignment quality, we create two manually annotated sentence-aligned datasets from two commonly used text simplification corpora, Newsela and Wikipedia. We propose a novel neural CRF alignment model which not only leverages the sequential nature of… 
Neural semi-Markov CRF for Monolingual Word Alignment
TLDR
A novel neural semi-Markov CRF alignment model, which unifies word and phrase alignments through variable-length spans and demonstrates good generalizability to three out-of-domain datasets and shows great utility in two downstream applications: automatic text simplification and sentence pair classification tasks.
On the Helpfulness of Document Context to Sentence Simplification
TLDR
This paper is the first to investigate the helpfulness of document context on sentence simplification and apply it to the sequence-to-sequence model and proposes a new model that makes full use of the context information.
MUSS: Multilingual Unsupervised Sentence Simplification by Mining Paraphrases
TLDR
MUSS uses a novel approach to sentence simplification that trains strong models using sentence-level paraphrase data instead of proper simplification data, which can be mined in any language from Common Crawl using semantic sentence embeddings, thus removing the need for labeled data.
Klexikon: A German Dataset for Joint Summarization and Simplification
TLDR
A new dataset for joint Text Simplification and Summarization based on German Wikipedia and the German children’s encyclopedia ”Klexikon”, consisting of almost 2, 900 documents is described, and a document-aligned version is released that particularly highlights the summarization aspect.
Investigating Text Simplification Evaluation
TLDR
This work investigates existing TS corpora, providing new insights that will motivate the improvement of existing state-ofthe-art TS evaluation methods and demonstrates that by improving the distribution of TS datasets, one can build more robust TS models.
Text Revision by On-the-Fly Representation Optimization
TLDR
This paper presents an iterative inplace editing approach for text revision, which achieves competitive and even better performance than state-of-the-art supervised methods on text simplification, and gains betterperformance than strong unsupervised methods onText formalization.
TS-ANNO: An Annotation Tool to Build, Annotate and Evaluate Text Simplification Corpora
TLDR
TS-ANNO, an open-source web application for manual creation and for evaluation of parallel corpora for text simplification, is introduced and calculates inter-annotator agreement of alignments and annotations.
How May I Help You? Using Neural Text Simplification to Improve Downstream NLP Tasks
TLDR
This paper investigates another potential use of neural TS: assisting machines performing natural language processing (NLP) tasks by simplifying input texts at prediction time and augmenting data to provide machines with additional information during training.
MVP: Multi-task Supervised Pre-training for Natural Language Generation
TLDR
This work proposes M ulti-task super V ised P re-training ( MVP) for natural language generation, and collects a labeled pre-training corpus from 45 datasets over seven generation tasks to pre-train the text generation model MVP.
GRS: Combining Generation and Revision in Unsupervised Sentence Simplification
We propose GRS: an unsupervised approach to sentence simplification that combines text generation and text revision. We start with an iterative framework in which an input sentence is revised using
...
...

References

SHOWING 1-10 OF 62 REFERENCES
Extracting Parallel Sentences from Comparable Corpora using Document Level Alignment
TLDR
This work advances the state of the art in parallel sentence extraction by modeling the document level alignment, motivated by the observation that parallel sentence pairs are often found in close proximity.
CATS: A Tool for Customized Alignment of Text Simplification Corpora
TLDR
This paper presents a freely available, language-independent tool for sentence alignment from parallel/comparable TS resources (document-aligned resources), which additionally offers the possibility for filtering sentences depending on the level of their semantic overlap.
Sentence Simplification by Monolingual Machine Translation
TLDR
By relatively careful phrase-based paraphrasing this model achieves similar simplification results to state-of-the-art systems, while generating better formed output, and argues that text readability metrics such as the Flesch-Kincaid grade level should be used with caution when evaluating the output of simplification systems.
Aligning Sentences from Standard Wikipedia to Simple Wikipedia
TLDR
This work improves monolingual sentence alignment for text simplification, specifically for text in standard and simple Wikipedia by using a greedy search over the document and a word-level semantic similarity score based on Wiktionary that also accounts for structural similarity through syntactic dependencies.
Complexity-Weighted Loss and Diverse Reranking for Sentence Simplification
TLDR
This work incorporates content word complexities, as predicted with a leveled word complexity model, into the loss function during training and generates a large set of diverse candidate simplifications at test time, and rerank these to promote fluency, adequacy, and simplicity.
A Monolingual Tree-based Translation Model for Sentence Simplification
TLDR
A Tree-based Simplification Model (TSM) is proposed, which, to the knowledge, is the first statistical simplification model covering splitting, dropping, reordering and substitution integrally.
Optimizing Statistical Machine Translation for Text Simplification
TLDR
This work is the first to design automatic metrics that are effective for tuning and evaluating simplification systems, which will facilitate iterative development for this task.
Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs
TLDR
A way to automatically identify operations in a parallel corpus and introduce a sequence-labeling approach based on these annotations is devised, which provides insights on the types of transformations that different approaches can model.
Integrating Transformer and Paraphrase Rules for Sentence Simplification
TLDR
A novel model based on a multi-layer and multi-head attention architecture and two innovative approaches to integrate the Simple PPDB (A Paraphrase Database for Simplification), an external paraphrase knowledge base for simplification that covers a wide range of real-world simplification rules.
Vecalign: Improved Sentence Alignment in Linear Time and Space
We introduce Vecalign, a novel bilingual sentence alignment method which is linear in time and space with respect to the number of sentences being aligned and which requires only bilingual sentence
...
...