Insights from Russian second language readability classification: complexity-dependent training requirements, and feature evaluation of multiple categories

@inproceedings{Reynolds2016InsightsFR,
  title={Insights from Russian second language readability classification: complexity-dependent training requirements, and feature evaluation of multiple categories},
  author={Robert Joshua Reynolds},
  booktitle={BEA@NAACL-HLT},
  year={2016}
}
I investigate Russian second language readability assessment using a machine-learning approach with a range of lexical, morphological, syntactic, and discourse features. Testing the model with a new collection of Russian L2 readability corpora achieves an F-score of 0.671 and adjacent accuracy 0.919 on a 6-level classification task. Information gain and feature subset evaluation shows that morphological features are collectively the most informative. Learning curves for binary classifiers… Expand
Investigating the importance of linguistic complexity features across different datasets related to language learning
We present the results of our investigations aiming at identifying the most informative linguistic complexity features for classifying language learning levels in three different datasets. TheExpand
Neural Text Categorization with Transformers for Learning Portuguese as a Second Language
TLDR
Despite the reduced size of the data sets available, it is found that the resulting models overperform previous carefully crafted feature based counterparts in most evaluation scenarios, thus offering a new state-of-art for this task in what concerns the Portuguese language. Expand
Simple or Complex? Learning to Predict Readability of Bengali Texts
TLDR
This paper correctly adopt document-level readability formulas traditionally used for U.S. based education system to the Bengali language with a proper age-to-age comparison and presents a readability analysis tool capable of analyzing text written in the Bengalis to provide in-depth information on its readability and complexity. Expand
BERT Embeddings for Automatic Readability Assessment
TLDR
The proposed method outperforms classical approaches in readability assessment using English and Filipino datasets and can be used as a substitute feature set for low-resource languages like Filipino with limited semantic and syntactic NLP tools to explicitly extract feature values for the task. Expand
Automatic proficiency level prediction for Intelligent Computer-Assisted Language Learning
TLDR
This thesis work proposes a framework for selecting sentences suitable as exercise items which encompasses a number of additional criteria such as well-formedness and independence from a larger textual context, and shows that models trained partly or entirely on reading texts can effectively predict the proficiency level of learner essays. Expand
Modeling the Readability of German Targeting Adults and Children: An empirically broad analysis and its cross-corpus validation
TLDR
This comprehensive German readability model is the first for which robust cross-corpus performance has been shown and shows high accuracy between 89.4%–98.9% for both data sets. Expand
Knowledge-Rich BERT Embeddings for Readability Assessment
TLDR
This study proposes an alternative way of utilizing the informationrich embeddings of BERT models through a joint-learning method combined with handcrafted linguistic features for readability assessment, and shows that the proposed method outperforms classical approaches in readability Assessment. Expand
Under the Microscope: Interpreting Readability Assessment Models for Filipino
TLDR
This work dissects machine learning-based readability assessment models in Filipino by performing global and local model interpretation to understand the contributions of varying linguistic features and discuss its implications in the context of the Filipino language. Expand
AutomAted text ReAdAbility Assessment foR RussiAn second lAnguAge leARneRs
  • 2018
words list coverage of a text 0.58 3.9e-57 0.60 2.3e-63 Percentage of neuter words per text 0.55 1.3e-49 0.60 5e-68 Median number of punctuation per sentence 0.55 1.4e-49 0.55 7.3-50 Percentage ofExpand
Relevant Parameters for the Classification of Reading Books Depending on the Degree of Textual Readability in Primary and Compulsory Secondary Education (CSE) Students
This paper tries to establish the most important parameters of readability when it comes to choosing reading books for students in the second and third stages of primary and compulsory secondaryExpand
...
1
2
3
...

References

SHOWING 1-10 OF 81 REFERENCES
On Improving the Accuracy of Readability Classification using Insights from Second Language Acquisition
TLDR
It is shown that the developmental measures from Second Language Acquisition research when combined with traditional readability features such as word length and sentence length provide a good indication of text readability across different grades. Expand
A Comparison of Features for Automatic Readability Assessment
TLDR
It is found that features based on in-domain language models have the highest predictive power and Entity-density and POS-features, in particular nouns, are individually very useful but highly correlated. Expand
Readability Classification for German using Lexical, Syntactic, and Morphological Features
TLDR
It is shown that readability classification for German based on syntactic, lexical and language model features from previous research on English is highly successful, reaching 89.7% accuracy, with the new morphological features making an important contribution. Expand
Single-Sentence Readability Prediction in Russian
TLDR
This study attempts to discover and analyze a set of possible features that can be used for single-sentence readability prediction in Russian and test the influence of syntactic features on predictability of structural complexity. Expand
A Readable Read: Automatic Assessment of Language Learning Materials based on Linguistic Complexity
TLDR
This paper proposes a supervised machine learning model, based on a range of linguistic features, that can reliably classify texts according to their difficulty level, and finds that using a combination of different features resulted in 7% improvement in classification accuracy at the sentence level, whereas at the document level, lexical features were more dominant. Expand
On the Contribution of MWE-based Features to a Readability Formula for French as a Foreign Language
TLDR
This study uses a MWE extractor combining a statistical approach with a linguistic filter to define 11 predictors that take into account the density and the probability of MWEs, but also their internal structure. Expand
Automatic readability assessment
TLDR
The development of an automatic tool to assess the readability of text documents is described and the correlation between grade levels predicted by the tool, expert ratings of text difficulty, and estimated latent difficulty derived from experiments involving adult participants with mild intellectual disabilities is measured. Expand
Combining Lexical and Grammatical Features to Improve Readability Measures for First and Second Language Texts
TLDR
This work evaluates a system that uses interpolated predictions of reading difficulty that are based on both vocabulary and grammatical features, and indicates that Grammatical features may play a more important role in second language readability than in first languagereadability. Expand
A machine learning approach to reading level assessment
TLDR
This paper uses support vector machines to combine features from n-gram language models, parses, and traditional reading level measures to produce a better method of assessing reading level, and explores ways that multiple human annotations can be used in comparative assessments of system performance. Expand
A New Dataset and Method for Automatically Grading ESOL Texts
We demonstrate how supervised discriminative machine learning techniques can be used to automate the assessment of 'English as a Second or Other Language' (ESOL) examination scripts. In particular,Expand
...
1
2
3
4
5
...