Learn More
This paper describes an open source voice creation toolkit that supports the creation of unit selection and HMM-based voices, for the MARY (Modular Architecture for Research on speech Synthesis) TTS platform. The toolkit can be easily employed to create voices in the languages already supported by MARY TTS, but also provides the tools and generic reusable(More)
In this paper we investigate the prosody and voice quality of dominance in scenario meetings. We have found that in these scenarios the most dominant person tends to speak with a louder-than-average voice quality and the least dominant person with a softer-than-average voice quality. We also found that the most dominant role in the meetings is the project(More)
This paper announces the availability of the magnetic resonance imaging (MRI) subset of the mngu0 corpus, a collection of articulatory speech data from one speaker containing different modalities. This subset comprises volumetric MRI scans of the speaker's vocal tract during sustained production of vowels and consonants, as well as dynamic mid-sagittal(More)
Work is currently being carried out on a speech database constructed in order to study speech rhythm in connection with speech rate. The database, BonnTempo-Corpus, and the Praat based analysis tools, BonnTempo-Tools, are a powerful instrument for examining various aspects of recently proposed rhythm measures (e.g. %V, C, nPVI, rPVI, etc.) in relation to(More)
The present paper addresses the issue of flexibility in expressive unit selection speech synthesis by using different style selection techniques. We select units from a mixed-style unit selection database, using either forced style switching, no control, symbolic target cost, or acoustic target cost as a style selection criterion. We assess the effect of(More)
The acoustic-phonetic properties of words spoken with three different levels of accentuation (de-accented, pre-nuclear and nuclear accented in broad-focus and nuclear accented in narrow-focus) are examined in question-answer elicited sentences and iterative imitations (on the syllable da) produced by six French and six German speakers. Normalised parameter(More)
One of the challenges of speech-to-speech translation is to accurately preserve the paralinguistic information in the speaker’s message. Information about affect and emotional intent of a speaker are often carried in more than one modality. For this reason, the possibility of multimodal interaction with the system and the conversation partner may greatly(More)
Synthesis of listener vocalisations is one of the focused research areas to improve emotionally coloured conversational speech synthesis. To communicate different intentions, a syn-thesiser should be capable of generating a broad range of vocal-isations with different kinds of acoustic properties. However, the data collection for corpus based methods is(More)