Learn More
This paper describes an open source voice creation toolkit that supports the creation of unit selection and HMM-based voices, for the MARY (Modular Architecture for Research on speech Synthesis) TTS platform. The toolkit can be easily employed to create voices in the languages already supported by MARY TTS, but also provides the tools and generic reusable(More)
This paper announces the availability of the magnetic resonance imaging (MRI) subset of the mngu0 corpus, a collection of articulatory speech data from one speaker containing different modalities. This subset comprises volumetric MRI scans of the speaker's vocal tract during sustained production of vowels and consonants, as well as dynamic mid-sagittal(More)
In this paper we investigate the prosody and voice quality of dominance in scenario meetings. We have found that in these scenarios the most dominant person tends to speak with a louder-than-average voice quality and the least dominant person with a softer-than-average voice quality. We also found that the most dominant role in the meetings is the project(More)
Work is currently being carried out on a speech database constructed in order to study speech rhythm in connection with speech rate. The database, BonnTempo-Corpus, and the Praat based analysis tools, BonnTempo-Tools, are a powerful instrument for examining various aspects of recently proposed rhythm measures (e.g. %V, C, nPVI, rPVI, etc.) in relation to(More)
This paper describes a framework for synthesis of expressive speech based on MARY TTS and Emotion Markup Language (EmotionML). We describe the creation of expressive unit selection and HMM-based voices using audiobook data labelled according to voice styles. Audiobook data is labelled/split according to voice styles by principal component analysis (PCA) of(More)
The acoustic-phonetic properties of words spoken with three different levels of accentuation (de-accented, pre-nuclear and nuclear accented in broad-focus and nuclear accented in narrow-focus) are examined in question-answer elicited sentences and iterative imitations (on the syllable da) produced by six French and six German speakers. Normalised parameter(More)
One of the challenges of speech-to-speech translation is to accurately preserve the paralinguistic information in the speaker’s message. Information about affect and emotional intent of a speaker are often carried in more than one modality. For this reason, the possibility of multimodal interaction with the system and the conversation partner may greatly(More)
The present paper addresses the issue of flexibility in expressive unit selection speech synthesis by using different style selection techniques. We select units from a mixed-style unit selection database, using either forced style switching, no control, symbolic target cost, or acoustic target cost as a style selection criterion. We assess the effect of(More)