Gerit P. Sonntag

Learn More
As an alternative to synthesis-by-rule, the use of neural networks in speech synthesis has been successfully applied to prosody generation, yet it is not known precisely which input parameters are responsible for good results. The approach presented here tries to quantify the contribution of each input parameter. This is done first by comparing the mean(More)
A corpus of read American English was designed as a research tool for speech synthesis and prosody research with an emphasis on concept-to-speech research. The total duration of the corpus is two hours. It was recorded with two native speakers who also provide the voices of the VERBMOBIL American English speech synthesis. The corpus was annotated(More)
We measured the comprehensibility of six German speech synthesis systems and one human voice in a dual task experiment that simulated the complexity of a real life task. PCM (Pulse Coded Modulation) and simulated GSM (Global System for Mobile communications) coding were compared. Both primary and secondary task showed significant differences in response(More)
This paper describes an experimental method for detecting prosodic functions. We assume that the first step towards content driven synthetic prosody generation (Concept-to-speech) is invariably to determine the perceptually relevant prosodic features. The proposed method has been applied to the detection of syntactic structure: dialogue acts and given/new(More)
In order to evaluate the prosodic output of a speech synthesis system independently from its segmental quality, we have developed a special way to delexicalize speech stimuli which we call PURR (Prosody Unveiling through Restricted Representation). We compared the use of PURR stimuli for the evaluation of prosodic naturalness in three different test(More)
An application-specific perceptual evaluation was carried out in order to compare six high-quality German text-to-speech systems. Subjects judged the systems' reading of an email message and a newspaper article according to four application-specific questions and six voice quality attributes. The results indicate significant differences between the systems.(More)
  • 1