Learn More
Recognizing people by the way they walk – also known as gait recognition – has been studied extensively in the recent past. Recent gait recognition methods solely focus on data extracted from an RGB video stream. With this work, we provide a means for multimodal gait recognition, by introducing the freely available TUM Gait from Audio, Image and Depth(More)
This paper proposes a multi-stream speech recognition system that combines information from three complementary analysis methods in order to improve automatic speech recognition in highly noisy and reverberant environments, as featured in the 2011 PAS-CAL CHiME Challenge. We integrate word predictions by a bidi-rectional Long Short-Term Memory recurrent(More)
Overlapping speech is known to degrade speaker diarization performance with impacts on speaker clustering and segmentation. While previous work made important advances in detecting overlapping speech intervals and in attributing them to relevant speakers, the problem remains largely unsolved. This paper reports the first application of convolutive(More)
The effective handling of overlapping speech is at the limits of the current state of the art in speaker diarization. This paper presents our latest work in overlap detection. We report the combination of features derived through convolutive non-negative sparse coding and new energy, spectral and voicing-related features within a conventional HMM system.(More)
There are two basic approaches for semantic processing in spoken language understanding: a rule based approach and a statistic approach. In this paper we combine both of them in a novel way by using statistical and syntactical dynamic bayesian networks (DBNs) together with Graph-ical Models (GMs) for spoken language understanding (SLU). GMs merge in a(More)
This paper presents recent advances in the application of convolutive non-negative sparse coding (CNSC) to the problem of overlap detection in the context of conference meetings and speaker diarization. CNSC is used to project a mixed speaker signal onto separate speaker bases and hence to detect intervals of competing speech. We present new energy ratio(More)
In this paper, we present an open-set online speaker diariza-tion system. The system is based on Gaussian mixture models (GMMs), which are used as speaker models. The system starts with just 3 such models (one each for both genders and one for non-speech) and creates models for individual speakers not till the speakers occur. As more and more speakers(More)