Corpus ID: 219310357

Is it simpler? An Evaluation of an Aligned Corpus of Standard-Simple Sentences

  title={Is it simpler? An Evaluation of an Aligned Corpus of Standard-Simple Sentences},
  author={Evelina Rennes},
Parallel monolingual resources are imperative for data-driven sentence simplification research. We present the work of aligning, at the sentence level, a corpus of all Swedish public authorities and municipalities web texts in standard and simple Swedish. We compare the performance of three alignment algorithms used for similar work in English (Average Alignment, Maximum Alignment, and Hungarian Alignment), and the best-performing algorithm is used to create a resource of 15,433 unique sentence… Expand

Figures and Tables from this paper


Translating from Complex to Simplified Sentences
Results are promising, showing that while the model is usually overcautious in producing simplifications, the overall quality of the sentences is not degraded and certain types of simplification operations, mainly lexical, are appropriately captured. Expand
Sentence Simplification by Monolingual Machine Translation
By relatively careful phrase-based paraphrasing this model achieves similar simplification results to state-of-the-art systems, while generating better formed output, and argues that text readability metrics such as the Flesch-Kincaid grade level should be used with caution when evaluating the output of simplification systems. Expand
Text Simplification from Professionally Produced Corpora
This work investigates the application of the recently created Newsela corpus, the largest collection of professionally written simplifications available, in TS tasks, and shows that the corpus can be used to learn sentence simplification patterns in more effective ways than corpora used in previous work. Expand
Building a Monolingual Parallel Corpus for Text Simplification Using Sentence Similarity Based on Alignment between Word Embeddings
This work proposes an unsupervised method that automatically builds the monolingual parallel corpus for text simplification using sentence similarity based on word embeddings using a many-to-one method. Expand
Learning to Simplify Sentences Using Wikipedia
A new translation model for text simplification is introduced that extends a phrase-based machine translation approach to include phrasal deletion in a corpus of 137K aligned sentence pairs extracted by aligning English Wikipedia and Simple English Wikipedia. Expand
Aligning Sentences from Standard Wikipedia to Simple Wikipedia
This work improves monolingual sentence alignment for text simplification, specifically for text in standard and simple Wikipedia by using a greedy search over the document and a word-level semantic similarity score based on Wiktionary that also accounts for structural similarity through syntactic dependencies. Expand
Simple English Wikipedia: A New Text Simplification Task
A new data set is introduced that pairs English Wikipedia with Simple English Wikipedia and is orders of magnitude larger than any previously examined for sentence simplification and contains the full range of simplification operations including rewording, reordering, insertion and deletion. Expand
Optimizing Statistical Machine Translation for Text Simplification
This work is the first to design automatic metrics that are effective for tuning and evaluating simplification systems, which will facilitate iterative development for this task. Expand
A Monolingual Tree-based Translation Model for Sentence Simplification
A Tree-based Simplification Model (TSM) is proposed, which, to the knowledge, is the first statistical simplification model covering splitting, dropping, reordering and substitution integrally. Expand
Data-Driven Sentence Simplification: Survey and Benchmark
Research on SS is surveyed, focusing on approaches that attempt to learn how to simplify using corpora of aligned original-simplified sentence pairs in English, which is the dominant paradigm nowadays. Expand