Ingmar Steiner

Learn More
This paper describes an open source voice creation toolkit that supports the creation of unit selection and HMM-based voices, for the MARY (Modular Architecture for Research on speech Synthesis) TTS platform. The toolkit can be easily employed to create voices in the languages already supported by MARY TTS, but also provides the tools and generic reusable(More)
This paper announces the availability of the magnetic resonance imaging (MRI) subset of the mngu0 corpus, a collection of articulatory speech data from one speaker containing different modalities. This subset comprises volumetric MRI scans of the speaker's vocal tract during sustained production of vowels and consonants, as well as dynamic mid-sagittal(More)
We present two concepts for the generation of gestural scores to control an articulatory speech synthesizer. Gestural scores are the common input to the synthesizer and constitute an organized pattern of articulatory gestures. The first concept generates the gestures for an utteranceusing the phonetic transcriptions, phone durations, and intonation commands(More)
Work is currently being carried out on a speech database constructed in order to study speech rhythm in connection with speech rate. The database, BonnTempo-Corpus, and the Praat based analysis tools, BonnTempo-Tools, are a powerful instrument for examining various aspects of recently proposed rhythm measures (e.g. %V, ∆C, nPVI, rPVI, etc.) in relation to(More)
The acoustic-phonetic properties of words spoken with three different levels of accentuation (de-accented, pre-nuclear and nuclear accented in broad-focus and nuclear accented in narrow-focus) are examined in question-answer elicited sentences and iterative imitations (on the syllable da) produced by six French and six German speakers. Normalised parameter(More)
We present a technique for the animation of a 3D kinematic tongue model, one component of the talking head of an acoustic-visual (AV) speech synthesizer. The skeletal animation approach is adapted to make use of a deformable rig controlled by tongue motion capture data obtained with electromagnetic articulography (EMA), while the tongue surface is extracted(More)
In this paper we investigate the prosody and voice quality of dominance in scenario meetings. We have found that in these scenarios the most dominant person tends to speak with a louder-than-average voice quality and the least dominant person with a softer-than-average voice quality. We also found that the most dominant role in the meetings is the project(More)
One of the challenges of speech-to-speech translation is to accurately preserve the paralinguistic information in the speaker’s message. Information about affect and emotional intent of a speaker are often carried in more than one modality. For this reason, the possibility of multimodal interaction with the system and the conversation partner may greatly(More)
This paper describes a framework for synthesis of expressive speech based on MARY TTS and Emotion Markup Language (EmotionML). We describe the creation of expressive unit selection and HMM-based voices using audiobook data labelled according to voice styles. Audiobook data is labelled/split according to voice styles by principal component analysis (PCA) of(More)