• Corpus ID: 10919200

On Improving the Accuracy of Readability Classification using Insights from Second Language Acquisition

@inproceedings{Vajjala2012OnIT,
  title={On Improving the Accuracy of Readability Classification using Insights from Second Language Acquisition},
  author={Sowmya Vajjala and Walt Detmar Meurers},
  booktitle={BEA@NAACL-HLT},
  year={2012}
}
We investigate the problem of readability assessment using a range of lexical and syntactic features and study their impact on predicting the grade level of texts. As empirical basis, we combined two web-based text sources, Weekly Reader and BBC Bitesize, targeting different age groups, to cover a broad range of school grades. On the conceptual side, we explore the use of lexical and syntactic measures originally designed to measure language development in the production of second language… 
Investigating the use of readability metrics to detect differences in written productions of learners : a corpus-based study
This paper deals with the use of readability metrics as indices of learmers' linguistic features in a written corpus of Spanish learners of English L2. Seventeen measures of readability are presented
Using Broad Linguistic Complexity Modeling for Cross-Lingual Readability Assessment
TLDR
It is shown that the linguistic complexity analyses for the cross-language experiments identify features successfully characterizing the readability of texts for language learners across languages, as well as some language-specific characteristics of different reading levels.
Insights from Russian second language readability classification: complexity-dependent training requirements, and feature evaluation of multiple categories
TLDR
Investigation of Russian second language readability assessment using a machine-learning approach with a range of lexical, morphological, syntactic, and discourse features shows that morphological features are collectively the most informative.
Application of Lexical Features Towards Improvement of Filipino Readability Identification of Children's Literature
TLDR
This paper explores the use of lexical features towards improving the development of readability identification of children’s books written in Filipino and shows that combining Lexical features (LEX) consisting of type-token ratio, lexical density, Lexical variation, foreign word count with traditional features (TRAD) increased the performance of readable models by almost a 5% margin.
Sentence-Level Readability Assessment for L2 Chinese Learning
TLDR
A research framework and a large corpus of nearly 40,000 sentences with ten-level readability annotation are provided and results suggest that the linguistic features can significantly improve the predictive performance with the highest of 70.78% distance-1 adjacent accuracy.
A Readable Read: Automatic Assessment of Language Learning Materials based on Linguistic Complexity
TLDR
This paper proposes a supervised machine learning model, based on a range of linguistic features, that can reliably classify texts according to their difficulty level, and finds that using a combination of different features resulted in 7% improvement in classification accuracy at the sentence level, whereas at the document level, lexical features were more dominant.
Text Readability Assessment for Second Language Learners
TLDR
A generalization method is applied to adapt models trained on larger native corpora to estimate text readability for learners, and domain adaptation and self-learning techniques are explored to make use of the native data to improve system performance on the limited L2 data.
Feature-Based Assessment of Text Readability
TLDR
The effects of text features to L2 learners are different to native language readers, and the emphasis is on text feature selection, since the features commonly effects the understanding of text content.
Rule-based and machine learning approaches for second language sentence-level readability
TLDR
Methods and knowledge from machine learning-based readability research, from rule-based studies of Good Dictionary Examples and from second language learning syllabuses are merged to present approaches for the identification of sentences understandable by second language learners of Swedish, which can be used in automatically generated exercises based on corpora.
On The Applicability of Readability Models to Web Texts
TLDR
Applying the readability models and the features they are based on to web search results finds that the average reading level of the retrieved web documents is relatively high, supporting the potential usefulness of readability ranking for the web.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 44 REFERENCES
Automatic readability assessment
TLDR
The development of an automatic tool to assess the readability of text documents is described and the correlation between grade levels predicted by the tool, expert ratings of text difficulty, and estimated latent difficulty derived from experiments involving adult participants with mild intellectual disabilities is measured.
A Comparison of Features for Automatic Readability Assessment
TLDR
It is found that features based on in-domain language models have the highest predictive power and Entity-density and POS-features, in particular nouns, are individually very useful but highly correlated.
A machine learning approach to reading level assessment
TLDR
This paper uses support vector machines to combine features from n-gram language models, parses, and traditional reading level measures to produce a better method of assessing reading level, and explores ways that multiple human annotations can be used in comparative assessments of system performance.
Combining Lexical and Grammatical Features to Improve Readability Measures for First and Second Language Texts
TLDR
This work evaluates a system that uses interpolated predictions of reading difficulty that are based on both vocabulary and grammatical features, and indicates that Grammatical features may play a more important role in second language readability than in first languagereadability.
Learning to Predict Readability using Diverse Linguistic Features
TLDR
This paper considers the problem of building a system to predict readability of natural-language documents using diverse features based on syntax and language models which are generally indicative of readability and shows that the learned system are more accurate than the predictions of naive human judges when compared against the predictions against linguistically-trained expert human judges.
A Language Modeling Approach to Predicting Reading Difficulty
TLDR
A measure based on an extension of multinomial naïve Bayes classification that combines multiple language models to estimate the most likely grade level for a given passage is derived, which is not specific to any particular subject and can be trained with relatively little labeled data.
On the Contribution of MWE-based Features to a Readability Formula for French as a Foreign Language
TLDR
This study uses a MWE extractor combining a statistical approach with a linguistic filter to define 11 predictors that take into account the density and the probability of MWEs, but also their internal structure.
Automatic analysis of syntactic complexity in second language writing
  • X. Lu
  • Computer Science
  • 2010
TLDR
The system takes a written language sample as input and produces fourteen indices of syntactic complexity of the sample based on these measures, which are designed with advanced second language proficiency research in mind and developed and evaluated using college-level second language writing data from the Written English Corpus of Chinese Learners.
READ–IT: Assessing Readability of Italian Texts with a View to Text Simplification
TLDR
A new approach to readability assessment with a specific view to the task of text simplification: the intended audience includes people with low literacy skills and/or with mild cognitive impairment.
Readability Assessment for Text Simplification
TLDR
A readability assessment approach to support the process of text simplification for poor literacy readers with a number of new features, and experiment with alternative ways to model this problem using machine learning methods, namely classification, regression and ranking.
...
1
2
3
4
5
...