Optimizing Statistical Machine Translation for Text Simplification

  title={Optimizing Statistical Machine Translation for Text Simplification},
  author={Wei Xu and Courtney Napoles and Ellie Pavlick and Quanze Chen and Chris Callison-Burch},
  journal={Transactions of the Association for Computational Linguistics},
Most recent sentence simplification systems use basic machine translation models to learn lexical and syntactic paraphrases from a manually simplified parallel corpus. These methods are limited by the quality and quantity of manually simplified corpora, which are expensive to build. In this paper, we conduct an in-depth adaptation of statistical machine translation to perform text simplification, taking advantage of large-scale paraphrases learned from bilingual texts and a small amount of… 

Sentence simplification with core vocabulary

The result shows that data having a medium S-BLEU score between the original sentence and a simple sentence is most effective for automatic text simplification by a machine translation approach.

Unsupervised Statistical Text Simplification

This paper presents the first unsupervised text simplification system based on phrase-based machine translation system, which leverages a careful initialization of phrase tables and language models.

Improving text simplification by corpus expansion with unsupervised learning

A simplification model that does not require a parallel corpus is constructed using an unsupervised translation model and it is confirmed that it is possible to learn the operation of simplification by preparing large-scale pseudo data even if there is non-parallel corpus for simplification.

Simple and Effective Text Simplification Using Semantic and Neural Methods

This work presents a simple and efficient splitting algorithm based on an automatic semantic parser that compares favorably to the state-of-the-art in combined lexical and structural simplification.

Text Simplification without Simplified Corpora

This research proposes text simplification methods by lexical substitution approach and monolingual translation approach for languages that cannot use large-scale simplified corpora, especially Japanese, and proposes novel paraphrase acquisition, meaning preservation filtering, simplicity filtering, and grammaticality ranking methods for Japanese.

Sentence Alignment Methods for Improving Text Simplification Systems

It is shown that using this dataset, even the standard phrase-based statistical machine translation models for ATS can outperform the state-of-the-art ATS systems.

Improving Neural Text Simplification Model with Simplified Corpora

This work proposes to pair simple training sentence with a synthetic ordinary sentence via back-translation, and treating this synthetic data as additional training data, and trains encoder-decoder model using synthetic sentence pairs and original sentence pairs, which can obtain substantial improvements on the available WikiLarge data and WikiSmall data compared with the state-of-the-art methods.

Large-Scale Hierarchical Alignment for Data-driven Text Rewriting

It is shown that pseudo-parallel sentences extracted with the proposed unsupervised method not only supplement existing parallel data, but can even lead to competitive performance on their own.

Neural CRF Model for Sentence Alignment in Text Simplification

A novel neural CRF alignment model is proposed which not only leverages the sequential nature of sentences in parallel documents but also utilizes a neural sentence pair model to capture semantic similarity.



Sentence Simplification by Monolingual Machine Translation

By relatively careful phrase-based paraphrasing this model achieves similar simplification results to state-of-the-art systems, while generating better formed output, and argues that text readability metrics such as the Flesch-Kincaid grade level should be used with caution when evaluating the output of simplification systems.

Learning to Simplify Sentences Using Wikipedia

A new translation model for text simplification is introduced that extends a phrase-based machine translation approach to include phrasal deletion in a corpus of 137K aligned sentence pairs extracted by aligning English Wikipedia and Simple English Wikipedia.

A Monolingual Tree-based Translation Model for Sentence Simplification

A Tree-based Simplification Model (TSM) is proposed, which, to the knowledge, is the first statistical simplification model covering splitting, dropping, reordering and substitution integrally.

Paraphrasing with Bilingual Parallel Corpora

This work defines a paraphrase probability that allows paraphrases extracted from a bilingual parallel corpus to be ranked using translation probabilities, and shows how it can be refined to take contextual information into account.

Improving Text Simplification Language Modeling Using Unsimplified Text Data

This paper examines language modeling for text simplification and finds that a combined model using both simplified and normal English data achieves a 23% improvement in perplexity and a 24% improvement on the lexical simplification task over a model trained only on simple data.

One Step Closer to Automatic Evaluation of Text Simplification Systems

This study explores the possibility of replacing the costly and time-consuming human evaluation of the grammaticality and meaning preservation of the output of text simplification (TS) systems with some automatic measures and tries to classify simplified sentences into those which are acceptable; those which need minimal post-editing; and those which should be discarded.

Simple English Wikipedia: A New Text Simplification Task

A new data set is introduced that pairs English Wikipedia with Simple English Wikipedia and is orders of magnitude larger than any previously examined for sentence simplification and contains the full range of simplification operations including rewording, reordering, insertion and deletion.

Aligning Sentences from Standard Wikipedia to Simple Wikipedia

This work improves monolingual sentence alignment for text simplification, specifically for text in standard and simple Wikipedia by using a greedy search over the document and a word-level semantic similarity score based on Wiktionary that also accounts for structural similarity through syntactic dependencies.

Hybrid Simplification using Deep Semantics and Machine Translation

A hybrid approach to sentence simplification which combines deep semantics and monolingual machine translation to derive simple sentences from complex ones that yields significantly simpler output that is both grammatical and meaning preserving.

Monolingual Distributional Similarity for Text-to-Text Generation

This work compares different distributional similarity feature-sets and shows significant improvements in grammaticality and meaning retention on the example text-to-text generation task of sentence compression, achieving state-of-the-art quality.