A New Approach to Accent Recognition and Conversion for Mandarin Chinese
@article{Ai2020ANA, title={A New Approach to Accent Recognition and Conversion for Mandarin Chinese}, author={Lin Ai and Shih-Ying Jeng and Homayoon S. M. Beigi}, journal={ArXiv}, year={2020}, volume={abs/2008.03359} }
Two new approaches to accent classification and conversion are presented and explored, respectively. The first topic is Chinese accent classification/recognition. The second topic is the use of encoder-decoder models for end-to-end Chinese accent conversion, where the classifier in the first topic is used for the training of the accent converter encoder-decoder model. Experiments using different features and model are performed for accent recognition. These features include MFCCs and…
Figures and Tables from this paper
References
SHOWING 1-10 OF 17 REFERENCES
Accent Identification by Combining Deep Neural Networks and Recurrent Neural Networks Trained on Long and Short Term Features
- Computer ScienceINTERSPEECH
- 2016
A combination of long-term and short-term training is proposed in this paper for automatic identification of foreign accents, and the performance greatly surpasses the provided baseline system.
Accent detection and speech recognition for Shanghai-accented Mandarin
- Physics, Computer ScienceINTERSPEECH
- 2005
A new approach that combines accent detection, accent discriminative acoustic features, acoustic adaptation and model selection for accented Chinese speech recognition is proposed and experimental results show that this approach can improve the recognition of accented speech.
Articulatory-based conversion of foreign accents with deep neural networks
- Computer ScienceINTERSPEECH
- 2015
Compared to a baseline method based on Gaussian mixture models, the DNN accent conversions were found to be 31% more intelligible, and were perceived more native-like in 68% of the cases.
Accent Conversion Using Artificial Neural Networks
- Computer Science
- 2017
A methodology for accent conversion that learns differences between a pair of accents and produces a series of transformation matrices that can be applied to extracted Mel Frequency Cepstral Coefficients is proposed.
Foreign Accent Conversion by Synthesizing Speech from Phonetic Posteriorgrams
- LinguisticsINTERSPEECH
- 2019
This work presents a framework for FAC that eliminates the need for conventional vocoders and therefore the need to use the native speaker’s excitation, and produces speech that sounds more clear, natural, and similar to the non-native speaker compared with a baseline system.
Voice conversion using deep neural networks with speaker-independent pre-training
- Computer Science2014 IEEE Spoken Language Technology Workshop (SLT)
- 2014
In this study, we trained a deep autoencoder to build compact representations of short-term spectra of multiple speakers. Using this compact representation as mapping features, we then trained an…
Voice conversion from non-parallel corpora using variational auto-encoder
- Computer Science2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)
- 2016
An SC framework based on variational auto-encoder which enables us to exploit non-parallel corpora and removes the requirement of parallel corpora or phonetic alignments to train a spectral conversion system is proposed.
X-Vectors: Robust DNN Embeddings for Speaker Recognition
- Computer Science2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2018
This paper uses data augmentation, consisting of added noise and reverberation, as an inexpensive method to multiply the amount of training data and improve robustness of deep neural network embeddings for speaker recognition.
Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio Representations
- Computer ScienceINTERSPEECH
- 2018
An adversarial learning framework for voice conversion is proposed, with which a single model can be trained to convert the voice to many different speakers, all without parallel data, by separating the speaker characteristics from the linguistic content in speech signals.
Deep Learning for Classification of Speech Accents in Video Games
- Computer ScienceAIIDE Workshops
- 2018
Deep learning is used to train a neural network to classify speech accents, which would provide game developers with an ability to analyze accent distribution in their titles as well as possibly help screening voiceover actors applying for a role.