Comparison of Automatic and Expert Teachers’ Rating of Computerized English Listening-Speaking Test

@article{Linlin2019ComparisonOA,
  title={Comparison of Automatic and Expert Teachers’ Rating of Computerized English Listening-Speaking Test},
  author={Cao Linlin},
  journal={English Language Teaching},
  year={2019},
  volume={13},
  pages={18-30}
}
  • Cao Linlin
  • Published 2019
  • Psychology
  • English Language Teaching
Through Many-Facet Rasch analysis, this study explores the rating differences between 1 computer automatic rater and 5 expert teacher raters on scoring 119 students in a computerized English listening-speaking test. Results indicate that both automatic and the teacher raters demonstrate good inter-rater reliability, though the automatic rater indicates less intra-rater reliability than college teacher and high school teacher raters under the stringent infit limits. There’s no central tendency… Expand
2 Citations

Figures and Tables from this paper

A multidimensional generalized many-facet Rasch model for rubric-based performance assessment
TLDR
A multidimensional IRT model is proposed for rubric-based performance assessment that is useful not only for improving the ability measurement accuracy, but also for detailed analysis of rubric quality and rubric construct validity. Expand

References

SHOWING 1-10 OF 44 REFERENCES
RATER GROUP BIAS IN THE SPEAKING ASSESSMENT OF FOUR L1 JAPANESE ESL STUDENTS
The purpose of this study, modeled after Kobayashi’s (1982) investigation of writing evaluations, is to determine whether the factors like language background and educational training affect raters’Expand
Automatic assessment of speech fluency in computer aided speech grading systems
TLDR
The correlation and mean square errors between the ground-truth estimated scores and automatically estimated scores suggests that the non linear score fitting methods (BP,SVR) are superior to the linear method and should be adopted in computer aided automatic scoring systems to improve efficiency and accuracy. Expand
Experimenting with a computer essay-scoring program based on ESL student writing scripts
TLDR
While computer essay-scoring programs may appear to rate inside a ‘black box’ with concomitant lack of transparency, they do have potential to act as a third rater, time-saving assessment tool, and as technology develops and rating becomes more transparent, so will their acceptability. Expand
Examining Rater Effects in TestDaF Writing and Speaking Performance Assessments: A Many-Facet Rasch Analysis
I studied rater effects in the writing and speaking sections of the Test of German as a Foreign Language (TestDaF). Building on the many-facet Rasch measurement methodology, the focus was on raterExpand
A Many-Facet Rasch analysis comparing essay rater behavior on an academic English reading/writing test used for two purposes
Abstract Second language (L2) writing researchers have noted that various rater and scoring variables may affect ratings assigned by human raters (Cumming, 1990; Vaughan, 1991; Weigle, 1994, 1998,Expand
Investigating variability in tasks and rater judgements in a performance test of foreign language speaking
Much of the recent debate that has surrounded the development and use of 'performance', or 'communicative' language tests has focused on a supposed trade-off between two sets of desirable qualities:Expand
HOW DO RATERS FROM INDIA PERFORM IN SCORING THE TOEFL IBT™ SPEAKING SECTION AND WHAT KIND OF TRAINING HELPS?
This study investigated the scoring of the Test of English as a Foreign Language™ Internet-based Test (TOEFL iBT™) Speaking section by bilingual or multilingual speakers of English and 1 or moreExpand
An investigation of the rating process in the IELTS oral interview
Holistic assessments of oral language proficiency are often made in relation to performance in conversational language proficiency interviews, one such example of which is the IELTS Oral Interview.Expand
Systematic effects in the rating of second-language speaking ability: test method and learner discourse
Major differences exist in two approaches to the study of second-language performance. Second-language-acquisition (SLA) research examines effects upon discourse, and is typically unconcerned withExpand
Using FACETS to model rater training effects
This article describes a study conducted to explore differences in rater severity and consistency among inexperienced and experienced raters both before and after rater training. Sixteen ratersExpand
...
1
2
3
4
5
...