Improved accent classification combining phonetic vowels with acoustic features

  title={Improved accent classification combining phonetic vowels with acoustic features},
  author={Zhenhao Ge},
  journal={2015 8th International Congress on Image and Signal Processing (CISP)},
  • Zhenhao Ge
  • Published 1 October 2015
  • Computer Science
  • 2015 8th International Congress on Image and Signal Processing (CISP)
Researches have shown accent classification can be improved by integrating semantic information into pure acoustic approach. In this work, we combine phonetic knowledge, such as vowels, with enhanced acoustic features to build an improved accent classification system. The classifier is based on Gaussian Mixture Model-Universal Background Model (GMM-UBM), with normalized Perceptual Linear Predictive (PLP) features. The features are further optimized by Principle Component Analysis (PCA) and… 

Figures and Tables from this paper

Acoustic-phonetic feature based Kannada dialect identification from vowel sounds

The role of duration, energy, pitch, and three formant features is found to be evidential in Kannada dialect classification.

Middle eastern and north african english speech corpus (MENAESC): automatic identification of MENA english accents

The system effectiveness in the identification of MENA English accents using the two approaches mentioned above is impacted by the proficiency of the Arabic speakers of English and the influence of their mother tongue.

Automatic dialect identification system for Kannada language using single and ensemble SVM algorithms

Proposed ADI systems with derived features have shown better performance over the state-of-the-art i-vector feature based systems on both datasets and indicates the existence of complementary dialect specific evidence with spectral and prosodic features.

A Computational Approach to Foreign Accent Classification



An empirical study of automatic accent classification

This paper finds that a purely acoustic approach based on a combination of heteroscedastic linear discriminant analysis (HLDA) and maximum mutual information (MMI) training is very effective in a large scale accent classification task: 23-way classification of foreign-accented English.

A novel approach to detecting non-native speakers and their native language

This work presents several experiments which show how their system outperforms the best published results on both the Fisher database and the foreign-accented English (FAE) database for detecting non-native speakers and their native language respectively.

Improved structure-based automatic estimation of pronunciation proficiency

This paper focuses on a proficiency estimation experiment done in [1] and, using the recently developed techniques for the structures, uses a smaller unit of structural analysis, speaker-invariant substructures, and relative structural distances between a learner and a teacher.

Yet another acoustic representation of speech sounds

  • N. Minematsu
  • Physics
    2004 IEEE International Conference on Acoustics, Speech, and Signal Processing
  • 2004
The proposed speech modeling can remove both multiplicative and linear transformational distortion from speech theoretically, which means that speech sounds are represented without being affected by any static distortion inevitably involved in production, encoding, transmission, decoding, and hearing processes.

Mispronunciation detection for language learning and speech recognition adaptation

In this thesis, a new HMM-based text-dependent mispronunciation system is introduced using text Adaptive Frequency Cepstral Coefficients (AFCCs) and it is shown that this system outperforms the conventional HMM method based on Mel Frequency Ceps (MFCCs).

PCA method for automated detection of mispronounced words

This paper presents a method for detecting mispronunciations with the aim of improving Computer Assisted Language Learning (CALL) tools used by foreign language learners based on Principle Component Analysis (PCA), which is computational efficient and effective when training data is limited.

Feature warping for robust speaker verification

A novel feature mapping approach that is robust to channel mismatch, additive noise and to some extent, non-linear effects attributed to handset transducers is proposed, and improvements of the warping technique over a number of methods such as Cepstral Mean Subtraction, modulation spectrum processing, and short-term windowed CMS and variance normalisation.

Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition

This thesis proposes an alternate architecture that goes beyond the basilar-membrane model, and, using which, auditory features can be computed in real time, and presents a unified framework for the problem of dimension reduction and HMM parameter estimation by modeling the original features with reduced-rank HMM.

Speaker Verification Using Adapted Gaussian Mixture Models

The major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs) are described.

Face recognition using LDA-based algorithms

A new algorithm is proposed that deals with both of the shortcomings in an efficient and cost effective manner of traditional linear discriminant analysis methods for face recognition systems.