Learn More
In recent years there has been a substantial debate about the need for increasingly spontaneous, conversational corpora of spoken interaction that are not controlled or task directed. In parallel the need has arisen for the recording of multi-modal corpora which are not restricted to the audio domain alone. With a corpus that would fulfill both needs, it(More)
This paper describes a new open source architecture for unit-selection based speech synthesis called BOSS (Bonn Open Synthesis System). It is built up modularly, with communications between modules taking place in a fixed format. This makes the addition, deletion and substitution of modules very easy. The strict separation between data and algorithms allows(More)
In a previous study [1] we investigated properties of communicative feedback produced by attentive and non-attentive listeners in dialogue. Distracted listeners were found to produce less feedback communicating understanding. Here, we assess the role of prosody in differentiating between feedback functions. We find significant differences across all studied(More)
Verbmobil is a speaker-independent system that offers translation assistance in dialogue situations. In cooperation with other institutes we are developing the speech synthesis module within Verbmobil for German and American English. Current priority is given to an enhancement of naturalness of our PSOLA based concatenative synthesis of German. Due to a(More)
Gestures and speech interact. They are linked in language production and perception, with their interaction contributing to felicitous communication. The multifaceted nature of these interactions has attracted considerable attention from the speech and gesture community. This article provides an overview of our current understanding of manual and head(More)
In order to gain knowledge about the interaction between top-down expectations of listeners concerning prosodic prominence and its acoustic correlates, two exploratory empirical studies were carried out. First, native and non-native subjects rated prominences of speech read at normal and very fast —prosodically very different — speech. Later, these ratings(More)
This paper presents ongoing work on the design, deployment and evaluation of a multimodal data acquisition architecture which utilises minimally invasive motion, head, eye and gaze tracking alongside high-quality audiovisual recording of human interactions. The different data streams are centrally collected and visualised at a single point and in real time(More)
Although an increasing amount of research has been carried out into human-machine interaction in the last century, even today we are not able to fully understand the dynamic changes in human interaction. Only when we achieve this, will we be able to go beyond a one-to-one mapping between text and speech and be able to add social information to speech(More)