• Corpus ID: 4896510

Learning to Simplify Sentences Using Wikipedia

  title={Learning to Simplify Sentences Using Wikipedia},
  author={William Coster and David Kauchak},
In this paper we examine the sentence simplification problem as an English-to-English translation problem, utilizing a corpus of 137K aligned sentence pairs extracted by aligning English Wikipedia and Simple English Wikipedia. This data set contains the full range of transformation operations including rewording, reordering, insertion and deletion. We introduce a new translation model for text simplification that extends a phrase-based machine translation approach to include phrasal deletion… 

Tables from this paper

Sentence Simplification by Monolingual Machine Translation

By relatively careful phrase-based paraphrasing this model achieves similar simplification results to state-of-the-art systems, while generating better formed output, and argues that text readability metrics such as the Flesch-Kincaid grade level should be used with caution when evaluating the output of simplification systems.

Sentence Simplification as Tree Transduction

A syntax-based sentence simplifier that models simplification using a probabilistic synchronous tree substitution grammar (STSG) and utilizes a multi-level backoff model with additional syntactic annotations that allow for better discrimination over previous STSG formulations.

Translating from Original to Simplified Sentences using Moses: When does it Actually Work?

The findings suggest that the standard phrase-based approach to the Text Simplification task might not be appropriate to learn strong simplifications which are needed for certain target populations.

Optimizing Statistical Machine Translation for Text Simplification

This work is the first to design automatic metrics that are effective for tuning and evaluating simplification systems, which will facilitate iterative development for this task.

Hybrid Simplification using Deep Semantics and Machine Translation

A hybrid approach to sentence simplification which combines deep semantics and monolingual machine translation to derive simple sentences from complex ones that yields significantly simpler output that is both grammatical and meaning preserving.

Improving Text Simplification Language Modeling Using Unsimplified Text Data

This paper examines language modeling for text simplification and finds that a combined model using both simplified and normal English data achieves a 23% improvement in perplexity and a 24% improvement on the lexical simplification task over a model trained only on simple data.

Splitting Complex English Sentences

This paper applies parsing technology to the task of syntactic simplification of English sentences, focusing on the identification of text spans that can be removed from a complex sentence. We report

SimpLe: Lexical Simplification using Word Sense Disambiguation

This chapter examines the process of lexical substitution and particularly the role that word sense disambiguation plays in this task, and provides empirical results which show that the method creates simplifications that significantly reduce the reading difficulty of the input text while maintaining its grammaticality and preserving its meaning.

Simplification Language Modeling Using Unsimplified Text Data

This paper examines language modeling for text simplification and finds that a combined model using both simplified and normal English data achieves a 23% improvement in perplexity and a 24% improvement on the lexical simplification task over a model trained only on simple data.

Is it simpler? An Evaluation of an Aligned Corpus of Standard-Simple Sentences

This work presents the work of aligning, at the sentence level, a corpus of all Swedish public authorities and municipalities web texts in standard and simple Swedish, and evaluates the resulting corpus using a set of features that has proven to predict text complexity of Swedish texts.



A Monolingual Tree-based Translation Model for Sentence Simplification

A Tree-based Simplification Model (TSM) is proposed, which, to the knowledge, is the first statistical simplification model covering splitting, dropping, reordering and substitution integrally.

Paraphrase Generation as Monolingual Translation: Data and Evaluation

It is demonstrated that BLEU correlates well with human judgements provided that the generated paraphrased sentence is sufficiently different from the source sentence.

Lexicalized Markov Grammars for Sentence Compression

A headdriven Markovization formulation of SCFG deletion rules is defined, which allows us to lexicalize probabilities of constituent deletions, and a robust approach for tree-to-tree alignment between arbitrary document-abstract parallel corpora is used.

Towards Robust Context-Sensitive Sentence Alignment for Monolingual Corpora

A new monolingual sentence alignment algorithm is presented, combining a sentence-based TF*IDF score, turned into a probability distribution using logistic regression, with a global alignment dynamic programming algorithm, achieving a substantial improvement in accuracy over existing systems.

Models for Sentence Compression: A Comparison across Domains, Training Requirements and Evaluation Measures

This paper provides a novel comparison between a supervised constituent-based and an weakly supervised word-based compression algorithm and examines how these models port to different domains (written vs. spoken text).

Mining Wikipedia Revision Histories for Improving Sentence Compression

This work proposes a novel lexicalized noisy channel model for sentence compression, achieving improved results in grammaticality and compression rate criteria with a slight decrease in importance.

Sentence Alignment for Monolingual Comparable Corpora

This work addresses the problem of sentence alignment for monolingual corpora by incorporating context into the search for an optimal alignment in two complementary ways: learning rules for matching paragraphs using topic structure and refining the matching through local alignment to find good sentence pairs.

Automatic induction of rules for text simplification

A Generic Sentence Trimmer with CRFs

The paper presents a novel sentence trimmer in Japanese, which combines a non-statistical yet generic tree generation model and Conditional Random Fields (CRFs), to address improving the

For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia

This work considers two main approaches to deriving simplification probabilities via an edit model that accounts for a mixture of different operations, and using metadata to focus on edits that are more likely to be simplification operations.