Corpus ID: 235446502

Pathological voice adaptation with autoencoder-based voice conversion

  title={Pathological voice adaptation with autoencoder-based voice conversion},
  author={M. Illa and B. Halpern and R. V. Son and L. Moro-Vel{\'a}zquez and O. Scharenborg},
In this paper, we propose a new approach to pathological speech synthesis. Instead of using healthy speech as a source, we customise an existing pathological speech sample to a new speaker’s voice characteristics. This approach alleviates the evaluation problem one normally has when converting typical speech to pathological speech, as in our approach, the voice conversion (VC) model does not need to be optimised for speech degradation but only for the speaker change. This change in the… Expand

Figures and Tables from this paper


An Objective Evaluation Framework for Pathological Speech Synthesis
This work utilises existing detection and analysis techniques to propose a general framework for the consistent evaluation of synthetic pathological speech and develops and test a dysarthric voice conversion system (VC) using CycleGAN-VC and a PSOLA-based speech rate modification technique. Expand
Adjusting dysarthric speech signals to be more intelligible
  • F. Rudzicz
  • Computer Science
  • Comput. Speech Lang.
  • 2013
A system that transforms the speech signals of speakers with physical speech disabilities into a more intelligible form that can be more easily understood by listeners and a substantial step towards full automation in speech transformation without the need for expert or clinical intervention is presented. Expand
The Voice Conversion Challenge 2016
The design of the challenge, its result, and a future plan to share views about unsolved problems and challenges faced by the current VC techniques are summarized. Expand
Synthesis of New Words for Improved Dysarthric Speech Recognition on an Expanded Vocabulary
This paper proposes a data augmentation method using voice conversion that allows dysarthric ASR systems to accurately recognize words outside of the training set vocabulary, and demonstrates that it’s possible to synthesize utterances of new words that were never recorded by speakers with dysarthria. Expand
Data Augmentation Using Healthy Speech for Dysarthric Speech Recognition
Data augmentation using temporal and speed modifications to healthy speech to simulate dysarthric speech is explored using tempo based and speed based data augmentation to improve ASR performance using healthy speech alone for training. Expand
Average Modeling Approach to Voice Conversion with Non-Parallel Data
The proposed approach makes use of a multi-speaker average model that maps speaker-independent linguistic features to speaker dependent acoustic features that doesn’t require parallel data in either average model training or adaptation. Expand
WaveNet Vocoder with Limited Training Data for Voice Conversion
Experimental results show that the WaveNet vocoders built using the proposed method outperform conventional STRAIGHT vocoder, and the system achieves an average naturalness MOS of 4.13 in VCC 2018, which is the highest among all submitted systems. Expand
Dysarthric Speech Recognition with Lattice-Free MMI
  • Enno Hermann, M. Magimai.-Doss
  • Computer Science
  • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2020
This paper focuses on the use of state-of-the-art sequence-discriminative training, in particular lattice-free maximum mutual information (LF-MMI), for improving dysarthric speech recognition. Expand
Non-parallel Voice Conversion based on Hierarchical Latent Embedding Vector Quantized Variational Autoencoder
This paper proposes a hierarchical latent embedding structure for Vector Quantized Variational Autoencoder (VQVAE) to improve the performance of the non-parallel voice conversion (NPVC) model.Expand
Simulating Dysarthric Speech for Training Data Augmentation in Clinical Speech Applications
A method for simulating training data for clinical applications by transforming healthy speech to dysarthric speech using adversarial training, and shows that by using the simulated speech samples to balance an existing dataset, the classification accuracy improves by rv 10% after data augmentation. Expand