• Corpus ID: 10919200

On Improving the Accuracy of Readability Classification using Insights from Second Language Acquisition

  title={On Improving the Accuracy of Readability Classification using Insights from Second Language Acquisition},
  author={Sowmya Vajjala and Walt Detmar Meurers},
We investigate the problem of readability assessment using a range of lexical and syntactic features and study their impact on predicting the grade level of texts. As empirical basis, we combined two web-based text sources, Weekly Reader and BBC Bitesize, targeting different age groups, to cover a broad range of school grades. On the conceptual side, we explore the use of lexical and syntactic measures originally designed to measure language development in the production of second language… 

Tables from this paper

Investigating the use of readability metrics to detect differences in written productions of learners : a corpus-based study

This paper deals with the use of readability metrics as indices of learmers' linguistic features in a written corpus of Spanish learners of English L2. Seventeen measures of readability are presented

Using Broad Linguistic Complexity Modeling for Cross-Lingual Readability Assessment

It is shown that the linguistic complexity analyses for the cross-language experiments identify features successfully characterizing the readability of texts for language learners across languages, as well as some language-specific characteristics of different reading levels.

Insights from Russian second language readability classification: complexity-dependent training requirements, and feature evaluation of multiple categories

Investigation of Russian second language readability assessment using a machine-learning approach with a range of lexical, morphological, syntactic, and discourse features shows that morphological features are collectively the most informative.

Application of Lexical Features Towards Improvement of Filipino Readability Identification of Children's Literature

This paper explores the use of lexical features towards improving the development of readability identification of children’s books written in Filipino and shows that combining Lexical features (LEX) consisting of type-token ratio, lexical density, Lexical variation, foreign word count with traditional features (TRAD) increased the performance of readable models by almost a 5% margin.

Sentence-Level Readability Assessment for L2 Chinese Learning

A research framework and a large corpus of nearly 40,000 sentences with ten-level readability annotation are provided and results suggest that the linguistic features can significantly improve the predictive performance with the highest of 70.78% distance-1 adjacent accuracy.

A Readable Read: Automatic Assessment of Language Learning Materials based on Linguistic Complexity

This paper proposes a supervised machine learning model, based on a range of linguistic features, that can reliably classify texts according to their difficulty level, and finds that using a combination of different features resulted in 7% improvement in classification accuracy at the sentence level, whereas at the document level, lexical features were more dominant.

Feature-Based Assessment of Text Readability

  • Lixiao ZhangZaiying LiuJun Ni
  • Linguistics, Computer Science
    2013 Seventh International Conference on Internet Computing for Engineering and Science
  • 2013
The effects of text features to L2 learners are different to native language readers, and the emphasis is on text feature selection, since the features commonly effects the understanding of text content.

Exploring Measures of “Readability” for Spoken Language: Analyzing linguistic features of subtitles to identify age-specific TV programs

Investigating several feature subsets, it is shown that the authentic material targeting specific age groups exhibits a broad range of linguistics and psycholinguistic characteristics that are indicative of the complexity of the language used.

Rule-based and machine learning approaches for second language sentence-level readability

Methods and knowledge from machine learning-based readability research, from rule-based studies of Good Dictionary Examples and from second language learning syllabuses are merged to present approaches for the identification of sentences understandable by second language learners of Swedish, which can be used in automatically generated exercises based on corpora.

On The Applicability of Readability Models to Web Texts

Applying the readability models and the features they are based on to web search results finds that the average reading level of the retrieved web documents is relatively high, supporting the potential usefulness of readability ranking for the web.



Automatic Readability Assessment

The development of an automatic tool to assess the readability of text documents is described and the correlation between grade levels predicted by the tool, expert ratings of text difficulty, and estimated latent difficulty derived from experiments involving adult participants with mild intellectual disabilities is measured.

A Comparison of Features for Automatic Readability Assessment

It is found that features based on in-domain language models have the highest predictive power and Entity-density and POS-features, in particular nouns, are individually very useful but highly correlated.

A machine learning approach to reading level assessment

Combining Lexical and Grammatical Features to Improve Readability Measures for First and Second Language Texts

This work evaluates a system that uses interpolated predictions of reading difficulty that are based on both vocabulary and grammatical features, and indicates that Grammatical features may play a more important role in second language readability than in first languagereadability.

Learning to Predict Readability using Diverse Linguistic Features

This paper considers the problem of building a system to predict readability of natural-language documents using diverse features based on syntax and language models which are generally indicative of readability and shows that the learned system are more accurate than the predictions of naive human judges when compared against the predictions against linguistically-trained expert human judges.

On the Contribution of MWE-based Features to a Readability Formula for French as a Foreign Language

This study uses a MWE extractor combining a statistical approach with a linguistic filter to define 11 predictors that take into account the density and the probability of MWEs, but also their internal structure.

Automatic analysis of syntactic complexity in second language writing

The system takes a written language sample as input and produces fourteen indices of syntactic complexity of the sample based on these measures, which are designed with advanced second language proficiency research in mind and developed and evaluated using college-level second language writing data from the Written English Corpus of Chinese Learners.

READ–IT: Assessing Readability of Italian Texts with a View to Text Simplification

A new approach to readability assessment with a specific view to the task of text simplification: the intended audience includes people with low literacy skills and/or with mild cognitive impairment.

Readability Assessment for Text Simplification

A readability assessment approach to support the process of text simplification for poor literacy readers with a number of new features, and experiment with alternative ways to model this problem using machine learning methods, namely classification, regression and ranking.

A Corpus-Based Evaluation of Syntactic Complexity Measures as Indices of College-Level ESL Writers' Language Development

This article reports results of a corpus-based evaluation of 14 syntactic complexity measures as objective indices of college-level English as a second language (ESL) writers' language development. I