Learn More
— In this paper we present our ongoing work in building technologies for natural multimodal human-robot interaction. We present our systems for spontaneous speech recognition, multimodal dialogue processing and visual perception of a user, which includes the recognition of pointing gestures as well as the recognition of a person's head orientation. Each of(More)
In this work, we present our progress in multi-source far field automatic speech-to-text transcription for lecture speech. In particular, we show how the best of several far field channels can be selected based on a signal-to-noise ratio criterion, and how the signals from multiple channels can be combined at either the waveform level using blind channel(More)
This paper describes the 2006 lecture recognition system developed at the Interactive Systems Laboratories (ISL), for individual head-microphone (IHM), single distant microphone (SDM), and multiple distant microphones (MDM) conditions. It was evaluated in RT-06S rich transcription meeting evaluation sponsored by the US National Institute of Standards and(More)
—In this paper, we present our work in building technologies for natural multimodal human–robot interaction. We present our systems for spontaneous speech recognition, multimodal dialogue processing, and visual perception of a user, which includes localization, tracking, and identification of the user, recognition of pointing gestures, as well as the(More)
The project Technology and Corpora for Speech to Speech Translation (TC-STAR) aims at making a breakthrough in speech-to-speech translation research, significantly reducing the gap between the performance of machines and humans at this task. Technological and scientific progress is driven by periodic , competitive evaluations within the project. In this(More)
This paper describes the Interactive Systems Lab's Meeting transcription system, which performs segmentation, speaker clustering as well as transcriptions of conversational meeting speech. The system described here was evaluated in NIST's RT-04S " Meeting " speech evaluation. This paper compares the performance of our Broadcast News and the most recent(More)
In this paper we present our current work on a tight coupling of a speech recognizer with a dialog manager and our results by restricting the search space of our grammar based speech recognizer through the information given by the dialog manager. As a result of the tight coupling the same lingus-tic knowledge sources can be used in both, speech rec-ognizer(More)
In prior work, we developed a speaker tracking system based on an extended Kalman filter using time delays of arrival (TDOAs) as acoustic features. While this system functioned well, its utility was limited to scenarios in which a single speaker was to be tracked. In this work, we remove this restriction by generalizing the IEKF, first to a probabilistic(More)
Cross-system adaptation and system combination methods, such as ROVER and confusion network combination, are known to lower the word error rate of speech recognition systems. They require the training of systems that are reasonably close in performance but at the same time produce output that differs in its errors. This provides complementary information(More)