David Kryze

Learn More
In this paper, we address the problem of the speaker-based segmentation, which is the first necessary step for several indexing tasks. It consists in recognizing from their voice the sequence of people engaged in a conversation. In our context, we make no assumptions about prior knowledge of the speaker characteristics (no speaker model, no speech model, no(More)
This paper addresses the problem of speaker-based seg-mentation. The aim is to segment the audio data with respect to the speakers. In our study, we assume that no prior information on speakers is available and that people do not speak simultaneously. Our segmenta-tion technique is operated in two passes: first, the most likely speaker changes are detected(More)
This paper presents a new speech feature representation using a wavelet decomposition of speech signal called subband analysis. This parameterization derives cepstral coefficients from the output of an unbalanced tree-structured filter-bank combining high-pass and low-pass filters with downsampling units. Inspired from the SUBCEP analysis of [1] and [2],(More)
In this paper we address the problem of speaker adaptation in noisy environments. We estimate speaker adapted models from noisy data by combining unsuper-vised speaker adaptation with noise compensation. We aim at using the resulting speaker adapted models in environments that differ from the adaptation environment, without a significant loss in(More)
In this paper we address the problem of speaker adaptation in noisy environments. We aim at estimating speaker adapted models from noisy data by combining unsuper-vised speaker adaptation with model-based noise compensation. Speaker adapted models obtained with this method should contain as little information about the environment as possible, so that they(More)
We demonstrate a new platform for holographic interactive 3D experience. New user experience includes holographic 3D visual and audio experience, natural free-space 3D interaction, and augmenting the interface of smaller devices (e.g. smartphones). The head tracking component is compact and non-intrusive to 3D glasses' appearance. Depth sensor based 3D hand(More)
This paper presents a generalized feature projection scheme which allows each feature dimension to be classified in a set of 1 to M classes, where M is the total number of classes. Our method is an extension of the classical full-space null-space approach where each dimension can only be classified in either M classes or 1 class. We believe that this more(More)
Multimedia content cannot be retrieved effectively unless metadata describing it is generated. However, metadata generation tends to be time-consuming and expensive, since it typically involves human beings going through the content and manually tagging it. The paper shows how automatic speech recognition (ASR) technology can be used to carry out metadata(More)