Corpus ID: 235694297

An Objective Evaluation Framework for Pathological Speech Synthesis

  title={An Objective Evaluation Framework for Pathological Speech Synthesis},
  author={Bence Mark Halpern and Julian Fritsch and Enno Hermann and Rob van Son and Odette Scharenborg and Mathew Magimai.-Doss},
The development of pathological speech systems is currently hindered by the lack of a standardised objective evaluation framework. In this work, (1) we utilise existing detection and analysis techniques to propose a general framework for the consistent evaluation of synthetic pathological speech. This framework evaluates the voice quality and the intelligibility aspects of speech and is shown to be complementary using our experiments. (2) Using our proposed evaluation framework, we develop and… Expand

Figures and Tables from this paper

Pathological voice adaptation with autoencoder-based voice conversion
A new approach to pathological speech synthesis is proposed, which customise an existing pathological speech sample to a new speaker’s voice characteristics to ensure that any degradation found in naturalness is due to the conversion process and not to the model exaggerating characteristics of a speech pathology. Expand
Towards Identity Preserving Normal to Dysarthric Voice Conversion
  • Wen-Chin Huang, Bence Mark Halpern, Lester Phillip Violeta, Odette Scharenborg, Tomoki Toda
  • Computer Science, Engineering
  • 2021
We present a voice conversion framework that converts normal speech into dysarthric speech while preserving the speaker identity. Such a framework is essential for (1) clinical decision makingExpand


Improving Recognition of Dysarthric Speech Using Severity Based Tempo Adaptation
A mechanism to adapt the tempo of sonorant part of Dysarthria speech to match that of normal speech, based on the severity of dysarthria is proposed. Expand
Pathological speech processing: State-of-the-art, current challenges, and future directions
This paper lists challenges such as controlling subjectivity in pathological speech assessments and patient variability in the application of ML-SP tools to the domain and discusses feature design methods and machine learning algorithms using a combination of domain knowledge and data driven methods. Expand
Data Augmentation Using Healthy Speech for Dysarthric Speech Recognition
Data augmentation using temporal and speed modifications to healthy speech to simulate dysarthric speech is explored using tempo based and speed based data augmentation to improve ASR performance using healthy speech alone for training. Expand
Pathological Speech Intelligibility Assessment Based on the Short-time Objective Intelligibility Measure
Experiments on databases of English and French patients suffering from Cerebral Palsy and Amyotrophic Lateral Sclerosis show that the proposed intelligibility measures can obtain a high correlation with subjective intelligibility ratings, outperforming several state-of-the-art pathological speech intelligibility Measures. Expand
Intelligibility Improvement of Dysarthric Speech using MMSE DiscoGAN
This paper proposes to use Discover GAN (DiscoGAN) along with Mean Square Error (MSE) regularization for Dysarthric-to-Normal speech conversion and observes that MMSE DiscoGAN outperforms DNN by 13.16% and 9.64% for male and female, respectively. Expand
Improving Dysarthric Speech Intelligibility Using Cycle-consistent Adversarial Training
Objective evaluation using automatic speech recognition of the generated utterance on a held-out test set shows that the recognition performance is improved compared with the original dysarthic speech after performing adversarial training, as the absolute WER has been lowered by 33.4%. Expand
Adjusting dysarthric speech signals to be more intelligible
  • F. Rudzicz
  • Computer Science
  • Comput. Speech Lang.
  • 2013
A system that transforms the speech signals of speakers with physical speech disabilities into a more intelligible form that can be more easily understood by listeners and a substantial step towards full automation in speech transformation without the need for expert or clinical intervention is presented. Expand
Phonetic Analysis of Dysarthric Speech Tempo and Applications to Robust Personalised Dysarthric Speech Recognition
An approach that non-linearly modifies speech tempo to reduce mismatch between typical and atypical speech is explored, resulting in a nearly 7% absolute improvement in comparison to baseline speaker-dependent trained system evaluated using UASpeech corpus. Expand
Matching of a test signal to a reference word hypothesis forms the core of many speech processing problems, including objective speech intelligibility assessment. This paper first shows that theExpand
Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones
In a common framework several algorithms that have been proposed recently, in order to improve the voice quality of a text-to-speech synthesis based on acoustical units concatenation based on pitch-synchronous overlap-add approach are reviewed. Expand