• Corpus ID: 14898735

Readability-based Sentence Ranking for Evaluating Text Simplification

@article{Vajjala2016ReadabilitybasedSR,
  title={Readability-based Sentence Ranking for Evaluating Text Simplification},
  author={Sowmya Vajjala and Walt Detmar Meurers},
  journal={ArXiv},
  year={2016},
  volume={abs/1603.06009}
}
We propose a new method for evaluating the readability of simplified sentences through pair-wise ranking. The validity of the method is established through in-corpus and cross-corpus evaluation experiments. The approach correctly identifies the ranking of simplified and unsimplified sentences in terms of their reading level with an accuracy of over 80%, significantly outperforming previous results. To gain qualitative insights into the nature of simplification at the sentence level, we studied… 

Figures and Tables from this paper

A Nontrivial Sentence Corpus for the Task of Sentence Readability Assessment in Portuguese
TLDR
A nontrivial sentence corpus in Portuguese is generated, taking advantage of a parallel corpus of simplification, in which each sentence triplet is aligned and has simplification operations annotated, being ideal for justifying possible mistakes of future methods.
Data-Driven Sentence Simplification: Survey and Benchmark
TLDR
Research on SS is surveyed, focusing on approaches that attempt to learn how to simplify using corpora of aligned original-simplified sentence pairs in English, which is the dominant paradigm nowadays.
OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification
TLDR
The collection and compilation of the OneStopEnglish corpus of texts written at three reading levels is described, and its usefulness for through two applications - automatic readability assessment and automatic text simplification is demonstrated.
BERT Embeddings for Automatic Readability Assessment
TLDR
The proposed method outperforms classical approaches in readability assessment using English and Filipino datasets and can be used as a substitute feature set for low-resource languages like Filipino with limited semantic and syntactic NLP tools to explicitly extract feature values for the task.
Knowledge-Rich BERT Embeddings for Readability Assessment
TLDR
This study proposes an alternative way of utilizing the informationrich embeddings of BERT models through a joint-learning method combined with handcrafted linguistic features for readability assessment, and shows that the proposed method outperforms classical approaches in readability Assessment.
Automatic Assessment of Conceptual Text Complexity Using Knowledge Graphs
TLDR
It is shown that graph-based measures of individual text concepts, as well as the way they relate to each other in the knowledge graph, have a high discriminative power when distinguishing between two versions of the same text.
Pushing on Text Readability Assessment: A Transformer Meets Handcrafted Linguistic Features
TLDR
This work explores suitable transformers and traditional ML models, then extracts 255 handcrafted linguistic features using self-developed extraction software to create several hybrid models, achieving state-of-the-art (SOTA) accuracy on popular datasets in readability assessment.
Assessing Relative Sentence Complexity using an Incremental CCG Parser
TLDR
The authors' evaluation on Simple and Standard Wikipedia sentence pairs suggests that incremental CCG features are indeed more useful than phrase structure features achieving 0.44 points gain in performance.
Integrating Meaning into Quality Evaluation of Machine Translation
TLDR
The results of two experiments confirm the benefit of meaning related features in predicting human evaluation of translation quality in addition to traditional metrics which focus mainly on form.
Measuring Text Complexity for Italian as a Second Language Learning Purposes
TLDR
This study evaluates the effectiveness of an automatic tool trained to assess text complexity in the context of Italian as a second language learning using three classifier models trained using linguistic features measured quantitatively and extracted from the texts.
...
...

References

SHOWING 1-10 OF 46 REFERENCES
Readability assessment for text simplification: From analysing documents to identifying sentential simplifications
TLDR
It is concluded that readability models can be useful for identifying simplification targets for human writers and for evaluating machine generated simplifications.
Assessing the relative reading level of sentence pairs for text simplification
TLDR
This paper explores readability models for identifying differences in the reading levels of simplified and unsimplified versions of sentences and shows that a relative ranking is preferable to an absolute binary one and that the accuracy of identifying relative simplification depends on the initial reading level of the unsimplification version.
One Step Closer to Automatic Evaluation of Text Simplification Systems
TLDR
This study explores the possibility of replacing the costly and time-consuming human evaluation of the grammaticality and meaning preservation of the output of text simplification (TS) systems with some automatic measures and tries to classify simplified sentences into those which are acceptable; those which need minimal post-editing; and those which should be discarded.
Assessing the Readability of Sentences: Which Corpora and Features?
The paper investigates the problem of sentence readability assessment, which is modelled as a classification task, with a specific view to text simplification. In particular, it addresses two open
Rule-based and machine learning approaches for second language sentence-level readability
TLDR
Methods and knowledge from machine learning-based readability research, from rule-based studies of Good Dictionary Examples and from second language learning syllabuses are merged to present approaches for the identification of sentences understandable by second language learners of Swedish, which can be used in automatically generated exercises based on corpora.
A Monolingual Tree-based Translation Model for Sentence Simplification
TLDR
A Tree-based Simplification Model (TSM) is proposed, which, to the knowledge, is the first statistical simplification model covering splitting, dropping, reordering and substitution integrally.
Sentence-level ranking with quality estimation
TLDR
This work provides a strategy based on machine learning that performs preference ranking on alternative machine translations of the same source, at sentence level, comparable to the one achieved by state-of-the-art reference-aware automatic evaluation metrics such as smoothed BLEU, METEOR and Levenshtein distance.
Learning Simple Wikipedia: A Cogitation in Ascertaining Abecedarian Language
TLDR
The potential of Simple Wikipedia to assist automatic text simplification by building a statistical classification system that discriminates simple English from ordinary English is investigated and can be applied as a tool to help writers craft simple text.
Motivations and Methods for Text Simplification
TLDR
This paper considers two alternatives to full parsing which could be used for simplification, one of which uses a Finite State Grammar (FSG) to produce noun and verb groups while the second uses a Supertagging model to produce dependency linkages.
Ranking explanatory sentences for opinion summarization
TLDR
The proposed methods for scoring the explanatoriness of a sentence are effective, outperforming a state of the art sentence ranking method for standard text summarization.
...
...