Christian Fügen

Learn More
In this paper we present our ongoing work in building technologies for natural multimodal human-robot interaction. We present our systems for spontaneous speech recognition, multimodal dialogue processing and visual perception of a user, which includes the recognition of pointing gestures as well as the recognition of a person's head orientation. Each of(More)
In this work, we present our progress in multi-source far field automatic speech-to-text transcription for lecture speech. In particular, we show how the best of several far field channels can be selected based on a signal-to-noise ratio criterion, and how the signals from multiple channels can be combined at either the waveform level using blind channel(More)
With increasing globalization, communication across language and cultural boundaries is becoming an essential requirement of doing business, delivering education, and providing public services. Due to the considerable cost of human translation services, only a small fraction of text documents and an even smaller percentage of spoken encounters, such as(More)
In this paper, we present our work in building technologies for natural multimodal human–robot interaction. We present our systems for spontaneous speech recognition, multimodal dialogue processing, and visual perception of a user, which includes localization, tracking, and identification of the user, recognition of pointing gestures, as well as the(More)
For years speech translation has focused on the recognition and translation of discourses in limited domains, such as hotel reservations or scheduling tasks. Only recently research projects have been started to tackle the problem of open domain speech recognition and translation of complex tasks such as lectures and speeches. In this paper we present the(More)
This paper describes the Interactive Systems Lab’s Meeting transcription system, which performs segmentation, speaker clustering as well as transcriptions of conversational meeting speech. The system described here was evaluated in NIST’s RT-04S “Meeting” speech evaluation. This paper compares the performance of our Broadcast News and the most recent(More)
Portable devices for the consumer market are becoming available in large quantities. Because of their design and use, human speech often is the input modality of choice, for example for car navigation systems or portable speech-to-speech translation devices. In this paper we describe our work in porting our existing desktop PC based speech recognition(More)
In human-mediated translation scenarios a human interpreter translates between a source and a target language using either a spoken or a written representation of the source language. In this paper we improve the recognition performance on the speech of the human translator spoken in the target language by taking advantage of the source language(More)