Learn More
— In this paper we present our ongoing work in building technologies for natural multimodal human-robot interaction. We present our systems for spontaneous speech recognition, multimodal dialogue processing and visual perception of a user, which includes the recognition of pointing gestures as well as the recognition of a person's head orientation. Each of(More)
With increasing globalization, communication across language and cultural boundaries is becoming an essential requirement of doing business, delivering education, and providing public services. Due to the considerable cost of human translation services, only a small fraction of text documents and an even smaller percentage of spoken encounters, such as(More)
The project Technology and Corpora for Speech to Speech Translation (TC-STAR) aims at making a breakthrough in speech-to-speech translation research, significantly reducing the gap between the performance of machines and humans at this task. Technological and scientific progress is driven by periodic, competitive evaluations within the project. For(More)
In this work, we present our progress in multi-source far field automatic speech-to-text transcription for lecture speech. In particular, we show how the best of several far field channels can be selected based on a signal-to-noise ratio criterion, and how the signals from multiple channels can be combined at either the waveform level using blind channel(More)
—In this paper, we present our work in building technologies for natural multimodal human–robot interaction. We present our systems for spontaneous speech recognition, multimodal dialogue processing, and visual perception of a user, which includes localization, tracking, and identification of the user, recognition of pointing gestures, as well as the(More)
The project Technology and Corpora for Speech to Speech Translation (TC-STAR) aims at making a breakthrough in speech-to-speech translation research, significantly reducing the gap between the performance of machines and humans at this task. Technological and scientific progress is driven by periodic , competitive evaluations within the project. In this(More)
This paper describes the Interactive Systems Lab's Meeting transcription system, which performs segmentation, speaker clustering as well as transcriptions of conversational meeting speech. The system described here was evaluated in NIST's RT-04S " Meeting " speech evaluation. This paper compares the performance of our Broadcast News and the most recent(More)
In this paper we present our current work on a tight coupling of a speech recognizer with a dialog manager and our results by restricting the search space of our grammar based speech recognizer through the information given by the dialog manager. As a result of the tight coupling the same lingus-tic knowledge sources can be used in both, speech rec-ognizer(More)
In prior work, we developed a speaker tracking system based on an extended Kalman filter using time delays of arrival (TDOAs) as acoustic features. While this system functioned well, its utility was limited to scenarios in which a single speaker was to be tracked. In this work, we remove this restriction by generalizing the IEKF, first to a probabilistic(More)