Spoken English Intelligibility Remediation with Pocketsphinx Alignment and Feature Extraction Improves Substantially Over the State of the Art

  title={Spoken English Intelligibility Remediation with Pocketsphinx Alignment and Feature Extraction Improves Substantially Over the State of the Art},
  author={Yuan Gao and Brij Mohan Lal Srivastava and James Salsman},
  journal={2018 2nd IEEE Advanced Information Management,Communicates,Electronic and Automation Control Conference (IMCEC)},
W279 use automatic speech recognition to assess spoken English learner pronunciation based on the authentic intelligibility of the learners' spoken responses determined from support vector machine (SVM) classifier or deep learning neural network model predictions of transcription correctness. Using numeric features produced by PocketSphinx alignment mode and many recognition passes searching for the substitution and deletion of each expected phoneme and insertion of unexpected phonemes in… 

Figures from this paper

ETLT 2021: Shared Task on Automatic Speech Recognition for Non-Native Children's Speech
The paper presents the Second ASR Challenge for Non-native Children’s Speech proposed as a Special Session at Interspeech 2021, following the successful first challenge at Interspeech 2020. The goal
Overview of the Interspeech TLT2020 Shared Task on ASR for Non-Native Children's Speech
The corpus of non-native children’s speech that was used for the ASR challenge, analyzes the results, and discusses some points that should be considered for subsequent challenges in this domain in the future.
Self-Supervised End-to-End ASR for Low Resource L2 Swedish
This work experiments with several monolingual and cross-lingual selfsupervised acoustic models to develop end-to-end ASR system for L2 Swedish, and indicates that these systems are competitive in performance with traditional ASR pipeline.
Apraxia world: a speech therapy game for children with speech sound disorders
A familiar style of game successfully engages children, speech exercises function well when decoupled from game control, and children are willing to complete required speech exercises while playing a game they enjoy.


Pronunciation accuracy and intelligibility of non-native speech
It is found that only 16% of the variability in word-level intelligibility can be explained by the presence of obvious mispronouncing, and that high agreement is seen when the results are aggregated across all words from the same speaker.
New Feature Parameters for Pronunciation Evaluation in English Presentations at International Conferences
The quality of new acoustic features that are useful when used in combination with the system’s estimates of pronunciation score and intelligibility are examined.
A statistical method of evaluating the pronunciation proficiency/intelligibility of English presentations by Japanese speakers
This work statistically analyzed the actual utterances of speakers to find combinations of acoustic and linguistic features with high correlation between the scores estimated by the system and native English teachers, and developed an online real-time score estimation system for Japanese learners of English using offline techniques to evaluate the pronunciation and intelligibility scores in real- time with almost the same ability as English teachers.
Pronunciation analysis for children with speech sound disorders
This paper proposes to improve accuracy by learning acoustic models from a large children's speech database, using an explicit model of typical pronunciation errors of children in the target age range, and explicit modeling of the acoustics of distorted phonemes.
Computer-assisted pronunciation training: From pronunciation scoring towards spoken language learning
  • Nancy F. Chen, Haizhou Li
  • Linguistics
    2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)
  • 2016
This paper reviews the research approaches used in computer-assisted pronunciation training (CAPT), addresses the existing challenges, and discusses emerging trends and opportunities. To complement
Automatic analysis of pronunciations for children with speech sound disorders
A Two-Pass Framework of Mispronunciation Detection and Diagnosis for Computer-Aided Pronunciation Training
This paper presents a two-pass framework with discriminative acoustic modeling for mispronunciation detection and diagnoses (MD&D), which guarantees full coverage of all possible error patterns while maximally exploiting the phonetic information derived from the text prompt.
Phone-level pronunciation scoring and assessment for interactive language learning
Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices
This paper presents a preliminary case study on the porting and optimization of CMU Sphinx-11, a popular open source large vocabulary continuous speech recognition (LVCSR) system, to hand-held devices, and is believed to be the firsthand-held LVCSR system available under an open-source license.
Speech recognition system for Handheld devices