• Corpus ID: 221090412

A New Approach to Accent Recognition and Conversion for Mandarin Chinese

  title={A New Approach to Accent Recognition and Conversion for Mandarin Chinese},
  author={Lin Ai and Shih-Ying Jeng and Homayoon S. M. Beigi},
Two new approaches to accent classification and conversion are presented and explored, respectively. The first topic is Chinese accent classification/recognition. The second topic is the use of encoder-decoder models for end-to-end Chinese accent conversion, where the classifier in the first topic is used for the training of the accent converter encoder-decoder model. Experiments using different features and model are performed for accent recognition. These features include MFCCs and… 


Accent Identification by Combining Deep Neural Networks and Recurrent Neural Networks Trained on Long and Short Term Features
A combination of long-term and short-term training is proposed in this paper for automatic identification of foreign accents, and the performance greatly surpasses the provided baseline system.
Accent detection and speech recognition for Shanghai-accented Mandarin
A new approach that combines accent detection, accent discriminative acoustic features, acoustic adaptation and model selection for accented Chinese speech recognition is proposed and experimental results show that this approach can improve the recognition of accented speech.
Articulatory-based conversion of foreign accents with deep neural networks
Compared to a baseline method based on Gaussian mixture models, the DNN accent conversions were found to be 31% more intelligible, and were perceived more native-like in 68% of the cases.
Accent Conversion Using Artificial Neural Networks
A methodology for accent conversion that learns differences between a pair of accents and produces a series of transformation matrices that can be applied to extracted Mel Frequency Cepstral Coefficients is proposed.
Foreign Accent Conversion by Synthesizing Speech from Phonetic Posteriorgrams
This work presents a framework for FAC that eliminates the need for conventional vocoders and therefore the need to use the native speaker’s excitation, and produces speech that sounds more clear, natural, and similar to the non-native speaker compared with a baseline system.
Voice conversion using deep neural networks with speaker-independent pre-training
In this study, we trained a deep autoencoder to build compact representations of short-term spectra of multiple speakers. Using this compact representation as mapping features, we then trained an
Voice conversion from non-parallel corpora using variational auto-encoder
An SC framework based on variational auto-encoder which enables us to exploit non-parallel corpora and removes the requirement of parallel corpora or phonetic alignments to train a spectral conversion system is proposed.
X-Vectors: Robust DNN Embeddings for Speaker Recognition
This paper uses data augmentation, consisting of added noise and reverberation, as an inexpensive method to multiply the amount of training data and improve robustness of deep neural network embeddings for speaker recognition.
Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio Representations
An adversarial learning framework for voice conversion is proposed, with which a single model can be trained to convert the voice to many different speakers, all without parallel data, by separating the speaker characteristics from the linguistic content in speech signals.
Deep Learning for Classification of Speech Accents in Video Games
Deep learning is used to train a neural network to classify speech accents, which would provide game developers with an ability to analyze accent distribution in their titles as well as possibly help screening voiceover actors applying for a role.