Text Complexity Classification Based on Linguistic Information: Application to Intelligent Tutoring of ESL

  title={Text Complexity Classification Based on Linguistic Information: Application to Intelligent Tutoring of ESL},
  author={M. Kurdi},
  journal={J. Data Min. Digit. Humanit.},
  • M. Kurdi
  • Published 2020
  • Computer Science
  • J. Data Min. Digit. Humanit.
The goal of this work is to build a classifier that can identify text complexity within the context of teaching reading to English as a Second Language (ESL) learners. To present language learners with texts that are suitable to their level of English, a set of features that can describe the phonological, morphological, lexical, syntactic, discursive, and psychological complexity of a given text were identified. Using a corpus of 6171 texts, which had already been classified into three… Expand
Readability Evaluation for Ukrainian Medicine Corpus(UKRMED)
This research aims to demonstrate the use of the most commonly used readability formulas on written health information in Ukrainian, compare and contrast these different formulas to various texts (simple, complex, and moderate), and prepare recommendations for using these formulas to the evaluation of readability medical texts in Ukrainian. Expand


Lexical and Syntactic Features Selection for an Adaptive Reading Recommendation System Based on Text Complexity
A classifier that can identify text complexity in service of English as a Second Language (ESL) learners is built and a set of features that can best describe the lexical and syntactic complexity of a given text were identified. Expand
Comparing Machine Learning Classification Approaches for Predicting Expository Text Difficulty
Compared the accuracy of four classification machine learning approaches using natural language processing features in predicting human ratings of text difficulty for two sets of texts, the hierarchical classification was the most accurate for the two text sets considered individually. Expand
N-gram-based text categorization
An N-gram-based approach to text categorization that is tolerant of textual errors is described, which worked very well for language classification and worked reasonably well for classifying articles from a number of different computer-oriented newsgroups according to subject. Expand
On The Applicability of Readability Models to Web Texts
Applying the readability models and the features they are based on to web search results finds that the average reading level of the retrieved web documents is relatively high, supporting the potential usefulness of readability ranking for the web. Expand
Latent Semantic Analysis for User Modeling
This work relies on LSA to represent the student model in a tutoring system, and designed tutoring strategies to automatically detect lexeme misunderstandings and to select among the various examples of a domain the one which is best to expose the student to. Expand
Computing and Evaluating Syntactic Complexity Features for Automated Scoring of Spontaneous Non-Native Speech
The goal is to find effective features, selected from a large set of features proposed previously and some new features designed in analogous ways from a syntactic complexity perspective that correlate well with human ratings of the same spoken responses, and to build automatic scoring models based on the most promising features by using machine learning methods. Expand
The Relationship of Lexical Richness to the Quality of ESL Learners' Oral Narratives.
This study was an examination of the relationship of lexical richness to the quality of English as a second language (ESL) learners' oral narratives. A computational system was designed to automateExpand
Coh-Metrix: Analysis of text on cohesion and language
Standard text readability formulas scale texts on difficulty by relying on word length and sentence length, whereas Coh-Metrix is sensitive to cohesion relations, world knowledge, and language and discourse characteristics. Expand
Automatic analysis of syntactic complexity in second language writing
  • X. Lu
  • Computer Science
  • 2010
The system takes a written language sample as input and produces fourteen indices of syntactic complexity of the sample based on these measures, which are designed with advanced second language proficiency research in mind and developed and evaluated using college-level second language writing data from the Written English Corpus of Chinese Learners. Expand
The relationship of lexical proficiency to the quality of ESL compositions
The role of the lexical component as one factor in holistic scoring was reported on and high, significant correlations were found for lexical variation, that is, the ratio of the number of different lexical items in the essay adjusted to length. Expand