Learn More
In this paper, we address the problem of the speakerbased segmentation, which is the first necessary step for several indexing tasks. It consists in recognizing from their voice the sequence of people engaged in a conversation. In our context, we make no assumptions about prior knowledge of the speaker characteristics (no speaker model, no speech model, no(More)
This paper addresses the problem of speaker-based segmentation. The aim is to segment the audio data with respect to the speakers. In our study, we assume that no prior information on speakers is available and that people do not speak simultaneously. Our segmentation technique is operated in two passes: first, the most likely speaker changes are detected(More)
We demonstrate a new platform for holographic interactive 3D experience. New user experience includes holographic 3D visual and audio experience, natural free-space 3D interaction, and augmenting the interface of smaller devices (e.g. smartphones). The head tracking component is compact and non-intrusive to 3D glasses' appearance. Depth sensor based 3D hand(More)
This paper presents a new speech feature representation using a wavelet decomposition of speech signal called subband analysis. This parameterization derives cepstral coefficients from the output of an unbalanced tree-structured filter-bank combining high-pass and low-pass filters with downsampling units. Inspired from the SUBCEP analysis of [1] and [2],(More)
In this paper we address the problem of speaker adaptation in noisy environments. We estimate speaker adapted models from noisy data by combining unsupervised speaker adaptation with noise compensation. We aim at using the resulting speaker adapted models in environments that differ from the adaptation environment, without a significant loss in performance.(More)
In this paper we address the problem of speaker adaptation in noisy environments. We aim at estimating speaker adapted models from noisy data by combining unsupervised speaker adaptation with model-based noise compensation. Speaker adapted models obtained with this method should contain as little information about the environment as possible, so that they(More)
This paper presents a generalized feature projection scheme which allows each feature dimension to be classified in a set of 1 to M classes, where M is the total number of classes. Our method is an extension of the classical full-space null-space approach where each dimension can only be classified in either M classes or 1 class. We believe that this more(More)
Cepstral mean normalization is the standard technique for channel robustness. Despite its good performance, the effectiveness of cepstral mean normalization (CMN) for short sentences is argued. CMN underlying hypothesis that the speech cepstral mean is constant is not valid for short processing windows. This implies the removal of some phonetic information.(More)