• Corpus ID: 219310357

Is it simpler? An Evaluation of an Aligned Corpus of Standard-Simple Sentences

  title={Is it simpler? An Evaluation of an Aligned Corpus of Standard-Simple Sentences},
  author={Evelina Rennes},
Parallel monolingual resources are imperative for data-driven sentence simplification research. We present the work of aligning, at the sentence level, a corpus of all Swedish public authorities and municipalities web texts in standard and simple Swedish. We compare the performance of three alignment algorithms used for similar work in English (Average Alignment, Maximum Alignment, and Hungarian Alignment), and the best-performing algorithm is used to create a resource of 15,433 unique sentence… 

Figures and Tables from this paper



Translating from Complex to Simplified Sentences

Results are promising, showing that while the model is usually overcautious in producing simplifications, the overall quality of the sentences is not degraded and certain types of simplification operations, mainly lexical, are appropriately captured.

Sentence Simplification by Monolingual Machine Translation

By relatively careful phrase-based paraphrasing this model achieves similar simplification results to state-of-the-art systems, while generating better formed output, and argues that text readability metrics such as the Flesch-Kincaid grade level should be used with caution when evaluating the output of simplification systems.

Text Simplification from Professionally Produced Corpora

This work investigates the application of the recently created Newsela corpus, the largest collection of professionally written simplifications available, in TS tasks, and shows that the corpus can be used to learn sentence simplification patterns in more effective ways than corpora used in previous work.

Learning to Simplify Sentences Using Wikipedia

A new translation model for text simplification is introduced that extends a phrase-based machine translation approach to include phrasal deletion in a corpus of 137K aligned sentence pairs extracted by aligning English Wikipedia and Simple English Wikipedia.

Aligning Sentences from Standard Wikipedia to Simple Wikipedia

This work improves monolingual sentence alignment for text simplification, specifically for text in standard and simple Wikipedia by using a greedy search over the document and a word-level semantic similarity score based on Wiktionary that also accounts for structural similarity through syntactic dependencies.

Optimizing Statistical Machine Translation for Text Simplification

This work is the first to design automatic metrics that are effective for tuning and evaluating simplification systems, which will facilitate iterative development for this task.

A Monolingual Tree-based Translation Model for Sentence Simplification

A Tree-based Simplification Model (TSM) is proposed, which, to the knowledge, is the first statistical simplification model covering splitting, dropping, reordering and substitution integrally.

Data-Driven Sentence Simplification: Survey and Benchmark

Research on SS is surveyed, focusing on approaches that attempt to learn how to simplify using corpora of aligned original-simplified sentence pairs in English, which is the dominant paradigm nowadays.

A Constrained Sequence-to-Sequence Neural Model for Sentence Simplification

A novel constrained neural generation model is implemented to simplify sentences given simplified words by combining both the word-level and the sentence-level simplifications, making use of their corresponding advantages.

Features Indicating Readability in Swedish Text

A study of different levels of analysis and a large number of features and how they affect an ML-system’s accuracy when it comes to readability assessment finds that the best performing features are language models based on part-of-speech and dependency type.