Detecting Mispronunciations of L2 Learners and Providing Corrective Feedback Using Knowledge-Guided and Data-Driven Decision Trees

  title={Detecting Mispronunciations of L2 Learners and Providing Corrective Feedback Using Knowledge-Guided and Data-Driven Decision Trees},
  author={Wei Li and Kehuang Li and Sabato Marco Siniscalchi and Nancy F. Chen and Chin-Hui Lee},
We propose a novel decision tree based framework to detect phonetic mispronunciations produced by L2 learners caused by using inaccurate speech attributes, such as manner and place of articulation. Compared with conventional score-based CAPT (computer assisted pronunciation training) systems, our proposed framework has three advantages: (1) each mispronunciation in a tree can be interpreted and communicated to the L2 learners by traversing the corresponding path from a leaf node to the root… 

Figures and Tables from this paper

Improving Mispronunciation Detection for Non-Native Learners with Multisource Information and LSTM-Based Deep Models

In this paper, we utilize manner and place of articulation features and deep neural network models (DNNs) with long short-term memory (LSTM) to improve the detection performance of phonetic

An Alignment Method Leveraging Articulatory Features for Mispronunciation Detection and Diagnosis in L2 English

A novel alignment method based on linguistic knowledge of articulatory manner and places to align the phone sequences of the reference text with L2 learners speech, which improves the F1-score over the state-of-the-art system by 4.9% relative.

Mispronunciation Diagnosis of L2 English at Articulatory Level Using Articulatory Goodness-Of-Pronunciation Features

Results indicate that the proposed method yields effective articulatory diagnosis of English produced by Korean learners using articulatory Goodness-Of-Pronunciation features using articulationbased confidence scores.

Correlational Neural Network Based Feature Adaptation in L2 Mispronunciation Detection

  • Wenwei DongYanlu Xie
  • Computer Science
    2019 International Conference on Asian Language Processing (IALP)
  • 2019
The mispronunciation detection accuracy of CorrNet based method has improved 3.19% over un-normalized Fbank feature and 1.74% over bottleneck feature in Japanese speaking Chinese corpus.

Normalization of GOP for Chinese Mispronunciation Detection

  • Wenwei DongYanlu Xie
  • Computer Science
    2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
  • 2019
Two ways to normalize GOP scores are proposed, to separate the GOP calculation of Chinese Initials and those of Chinese Finals, and to use the corresponding native pronunciation score as a template to scale the non-native one.

DNN-Based Scoring of Language Learners’ Proficiency Using Learners’ Shadowings and Native Listeners’ Responsive Shadowings

Experiments show that the correlation between the DNN-based predicted scores and the averaged human scores is higher than or at least comparable to the averaged correlation betweenThe scores of human raters, indicating that the proposed automatic rating module can be introduced to language education as another human rater.

Speech Processing for Language Learning: A Practical Approach to Computer-Assisted Pronunciation Teaching

It is shown that even having sufficiently correct production of phonemes, the learners do not produce a correct phrasal rhythm and intonation, and therefore, the joint training of sounds, rhythm andintonation within a single learning environment is beneficial.

Automatic Scoring Minimal-Pair Pronunciation Drills by Using Recognition Likelihood Scores and Phonological Features

A more thorough comparison of PFs with more likelihood features proposed in the previous literature showed that PFs brought additional performance gains over basic likelihood features, but not for the feature set containing log likelihood ratio (LLR) features.

Integrating Articulatory Features into Acoustic-Phonemic Model for Mispronunciation Detection and Diagnosis in L2 English Speech

Novel approaches to mispronunciation detection and diagnosis on second-language (L2) learners' speech with articulatory features are proposed based on acoustic-phonemic model (APM) and several model architectures are investigated for better exploiting articulation features.

Using tone-based extended recognition network to detect non-native Mandarin tone mispronunciations

A DNN tone-based extended recognition network (ERN) approach to Mandarin tone recognition and tone mispronunciation detection and the proposed framework reduces the equal error rate by 10.98% relative.



Improving non-native mispronunciation detection and enriching diagnostic feedback with DNN-based speech attribute modeling

We propose the use of speech attributes, such as voicing and aspiration, to address two key research issues in computer assisted pronunciation training (CAPT) for L2 learners, namely detecting

A preliminary study on ASR-based detection of Chinese mispronunciation by Japanese learners

The experimental results show that the approach can detect the mostly representative pronunciation errors moderately well, achieving a false rejection rate of 8.0% and a false acceptance rate 32.6%, and the diagnostic accuracy is 86.0%.

Mispronunciation detection and diagnosis in l2 english speech using multi-distribution Deep Neural Networks

  • Kun LiH. Meng
  • Computer Science
    The 9th International Symposium on Chinese Spoken Language Processing
  • 2014
An Acoustic Phonological Model (APM) is proposed using a multi-distribution DNN, whose input features include acoustic features and corresponding canonical pronunciations, and which significantly outperforms the approach of forced-alignment with ERNs.

Automatic derivation of phonological rules for mispronunciation detection in a computer-assisted pronunciation training system

An approach for automatic derivation of phonological rules from L2 speech that captures the canonical pronunciations of words, as well as the possible mispronunciations, and offers improved performance in diagnostic accuracy.

A study on robust detection of pronunciation erroneous tendency based on deep neural network

Experimental results showed that the DNN- HMM PET modeling achieved more robust detection accuracies than the previous GMM-HMM, and the three kinds of features behaved differently.

Automatic detection of phone-level mispronunciation for language learning

Two approaches were evaluated; in the first approach, log-posterior probability-based scores are computed for each phone segment, and a log-likelihood ratio score is computed using the incorrect and correct pronunciation models.

Deriving salient learners’ mispronunciations from cross-language phonological comparisons

An automatic speech recognizer is developed by training cross-word triphone models based on the TIMIT corpus and an "extended" pronunciation lexicon is developed that incorporates the predicted phonetic confusions to generate additional, erroneous pronunciation variants for each word.

Landmark-based automated pronunciation error detection

The method was trained on the phonemes that are difficult for Korean learners and tested on intermediate Korean learners, and the combination of the two methods without the appropriate training data did not lead to improvement.

An interactive English pronunciation dictionary for Korean learners

A speech corpus is designed and collected to address the phonological and prosodic issues of Korean EFL learners and leverages the SUMMIT speech recognizer’s ability to model phonological rules to automatically identify non-native phonological phenomena.