Share This Author
Universal Dependencies 2.1
The annotation scheme is based on (universal) Stanford dependencies, Google universal part-of-speech tags, and the Interset interlingua for morphosyntactic tagsets for morpho-lingual tagsets.
On Improving the Accuracy of Readability Classification using Insights from Second Language Acquisition
It is shown that the developmental measures from Second Language Acquisition research when combined with traditional readability features such as word length and sentence length provide a good indication of text readability across different grades.
Readability Classification for German using Lexical, Syntactic, and Morphological Features
It is shown that readability classification for German based on syntactic, lexical and language model features from previous research on English is highly successful, reaching 89.7% accuracy, with the new morphological features making an important contribution.
OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification
The collection and compilation of the OneStopEnglish corpus of texts written at three reading levels is described, and its usefulness for through two applications - automatic readability assessment and automatic text simplification is demonstrated.
Experiments with Universal CEFR Classification
This paper explores universal CEFR classification using domain-specific and domain-agnostic, theory-guided as well as data-driven features, and reports the results of preliminary experiments in monolingual, cross-lingual, and multilingual classification with three languages: German, Czech, and Italian.
Automatic CEFR Level Prediction for Estonian Learner Text
This paper reports on approaches for automatically predicting a learner’s language proficiency in Estonian according to the European CEFR scale using the morphological and POS tag information extracted from the texts written by learners and concluded that classification is more effective than regression in terms of exact error and the direction of error.
Automated Assessment of Non-Native Learner Essays: Investigating the Role of Linguistic Features
- Sowmya Vajjala
- Linguistics, Computer ScienceInternational Journal of Artificial Intelligence…
- 2 December 2016
The role of various linguistic features in automatic essay scoring is explored using two publicly available datasets of non-native English essays written in test taking scenarios and the results show that the feature set used results in good predictive models with both datasets.
Analyzing Text Complexity and Text Simplification: Connecting Linguistics, Processing and Educational Applications
- Sowmya Vajjala
- Computer Science
- 3 August 2015
Assessing the relative reading level of sentence pairs for text simplification
- Sowmya Vajjala, Walt Detmar Meurers
- LinguisticsConference of the European Chapter of the…
- 1 April 2014
This paper explores readability models for identifying differences in the reading levels of simplified and unsimplified versions of sentences and shows that a relative ranking is preferable to an absolute binary one and that the accuracy of identifying relative simplification depends on the initial reading level of the unsimplification version.
Combining Shallow and Linguistically Motivated Features in Native Language Identification
- Serhiy Bykh, Sowmya Vajjala, J. Krivanek, Walt Detmar Meurers
- Computer Science, LinguisticsBEA@NAACL-HLT
- 1 June 2013
A range of features and ensembles for the task of Native Language Identification as part of the NLI Shared Task (Tetreault et al., 2013) are explored, testing different linguistic abstractions such as partof-speech, dependencies, and syntactic trees as features for NLI.