Learn More
In this paper we present our ongoing work in building technologies for natural multimodal human-robot interaction. We present our systems for spontaneous speech recognition, multimodal dialogue processing and visual perception of a user, which includes the recognition of pointing gestures as well as the recognition of a person's head orientation. Each of(More)
In this work, we present our progress in multi-source far field automatic speech-to-text transcription for lecture speech. In particular, we show how the best of several far field channels can be selected based on a signal-to-noise ratio criterion, and how the signals from multiple channels can be combined at either the waveform level using blind channel(More)
This paper describes the Interactive Systems Lab's Meeting transcription system, which performs segmentation, speaker clustering as well as transcriptions of conversational meeting speech. The system described here was evaluated in NIST's RT-04S " Meeting " speech evaluation. This paper compares the performance of our Broadcast News and the most recent(More)
With increasing globalization, communication across language and cultural boundaries is becoming an essential requirement of doing business, delivering education, and providing public services. Due to the considerable cost of human translation services, only a small fraction of text documents and an even smaller percentage of spoken encounters, such as(More)
The project Technology and Corpora for Speech to Speech Translation (TC-STAR) aims at making a breakthrough in speech-to-speech translation research, significantly reducing the gap between the performance of machines and humans at this task. Technological and scientific progress is driven by periodic, competitive evaluations within the project. For(More)
Emotions are very important in human-human communication but are usually ignored in human-computer interaction. Recent work focuses on recognition and generation of emotions as well as emotion driven behavior. Our work focuses on the use of emotions in dialogue systems that can be used with speech input or as well in multi-modal environments.This paper(More)
—In this paper, we present our work in building technologies for natural multimodal human–robot interaction. We present our systems for spontaneous speech recognition, multimodal dialogue processing, and visual perception of a user, which includes localization, tracking, and identification of the user, recognition of pointing gestures, as well as the(More)
Portable devices for the consumer market are becoming available in large quantities. Because of their design and use, human speech often is the input modality of choice, for example for car navigation systems or portable speech-to-speech translation devices. In this paper we describe our work in porting our existing desktop PC based speech recognition(More)
The project Technology and Corpora for Speech to Speech Translation (TC-STAR) aims at making a breakthrough in speech-to-speech translation research, significantly reducing the gap between the performance of machines and humans at this task. Technological and scientific progress is driven by periodic , competitive evaluations within the project. In this(More)