Adaptive frequency cepstral coefficients for word mispronunciation detection

@article{Ge2011AdaptiveFC,
  title={Adaptive frequency cepstral coefficients for word mispronunciation detection},
  author={Zhenhao Ge and Sudhendu R. Sharma and Mark J. T. Smith},
  journal={2011 4th International Congress on Image and Signal Processing},
  year={2011},
  volume={5},
  pages={2388-2391}
}
Systems based on automatic speech recognition (ASR) technology can provide important functionality in computer assisted language learning applications. This is a young but growing area of research motivated by the large number of students studying foreign languages. Here we propose a Hidden Markov Model (HMM)-based method to detect mispronunciations. Exploiting the specific dialog scripting employed in language learning software, HMMs are trained for different pronunciations. New adaptive… 

Figures and Tables from this paper

Mispronunciation detection for language learning and speech recognition adaptation

TLDR
In this thesis, a new HMM-based text-dependent mispronunciation system is introduced using text Adaptive Frequency Cepstral Coefficients (AFCCs) and it is shown that this system outperforms the conventional HMM method based on Mel Frequency Ceps (MFCCs).

A Two-Pass Framework of Mispronunciation Detection and Diagnosis for Computer-Aided Pronunciation Training

TLDR
This paper presents a two-pass framework with discriminative acoustic modeling for mispronunciation detection and diagnoses (MD&D), which guarantees full coverage of all possible error patterns while maximally exploiting the phonetic information derived from the text prompt.

XDF-REPA: A Densely Labeled Dataset toward Refined Pronunciation Assessment for English Learning

TLDR
This paper addresses the issue of refined pronunciation assessment (RPA), which aims at providing more refined information to L2 learners, and presents the XDF-REPA dataset, which is freely available to the public.

Machine Learning Applied to Aspirated and Non-Aspirated Allophone Classification–An Approach Based on Audio “Fingerprinting”

The purpose of this study is to involve both Convolutional Neural Networks and a typical learning algorithm in the allophone classification process. A list of words including aspirated and

Mispronunciation detection for CALL systems

TLDR
The paper deals with two methods used for pronunciation enhancement and the problem of control and synchronization of video- and audio-infromation tempo stays topical for Computer-Ass Language Learning systems.

Sleep Stages Classification Using Neural Networks with Multi-channel Neural Data

TLDR
A new and robust 5-stage sleep classification algorithm based on feedforward neural network, with flexible model structure using normalized Power Spectral Density (PSD) collected from multi-channel neural data is presented.

MFCC-Based Voice Recognition System for Home Automation Using Dynamic Programming

TLDR
Speech recognition is a multileveled pattern recognition task, in which acoustical signals are examined and structured into a hierarchy of subword units, words, phrases, and sentences, which can best be exploited by combining decisions probabilistically at all lower levels, and making discrete decisions only at the highest level.

References

SHOWING 1-10 OF 14 REFERENCES

The SRI EduSpeak System: Recognition and Pronunciation Scoring for Language Learning

TLDR
This work reports results on the application of adaptation techniques to recognize both native and nonnative speech in a speaker-independent manner and discusses the pronunciation scoring paradigm and shows experimental results in the form of correlations between the pronunciation quality estimators included in the toolkit and grades given by human listeners.

Automatic mispronunciation detection for Mandarin

TLDR
Proposing scaled log-posterior probability (SLPP) and weighted phone SLPP to get the better measure of pronunciation quality and introducing speaker normalization of speaker adaptive training (SAT) and speaker adaptation of selective maximum likelihood linear regression (SMLLR) to get a better statistical model are presented.

Implementation of an extended recognition network for mispronunciation detection and diagnosis in computer-assisted pronunciation training

TLDR
A set of context-sensitive phonological rules based on cross-language (Cantonese versus English) analysis which has also been validated against common mispronunciations observed from the learners interlanguage are developed.

Automatic evaluation and training in English pronunciation

TLDR
A study of the feasibility of automati cally grading the performance of Japanese students when reading English aloud showed that ratings of speech quality by experts are very reliable and automatic grades correlate well with those expert ratings.

Automatic derivation of phonological rules for mispronunciation detection in a computer-assisted pronunciation training system

TLDR
An approach for automatic derivation of phonological rules from L2 speech that captures the canonical pronunciations of words, as well as the possible mispronunciations, and offers improved performance in diagnostic accuracy.

Mispronunciation detection based on cross-language phonological comparisons

TLDR
A method using speech recognition with linguistic constraints to detect the mispronunciations made by Cantonese learners of English using acoustic models trained with native speakerspsila speech and used for recognizing the phone sequences, given the orthographic transcriptions.

Improving mispronunciation detection using machine learning

TLDR
This paper investigates the problem of mispronunciation detection by considering the influence of speaker and syllables, and shows the effectiveness of the method by reducing the average false acceptance rate (FAR) by 42.5% using data set generated by model without adaptation to observation set.

Capturing L2 segmental mispronunciations with joint-sequence models in Computer-Aided Pronunciation Training (CAPT)

TLDR
This study presents an extension to the previous efforts on automatically detecting text-dependent segmental mispronunciations by Cantonese (L1) learners of American English (L2), through modeling the L2 production, with a grapheme-to-phoneme model.

Automatic detection of mispronunciation for language instruction

TLDR
This work uses pronunciation scoring techniques to evaluate the performance of the mispronunciation model, and focuses on automatic detection of mispronouncing.