Christian Wellekens

Learn More
In this paper, we address the problem of speaker-based segmentation, which is the ®rst necessary step for several indexing tasks. It aims to extract homogeneous segments containing the longest possible utterances produced by a single speaker. In our context, no assumption is made about prior knowledge of the speaker or speech signal characteristics (neither(More)
Hidden Markov models are widely used for automatic speech recognition. They inherently incorporate the sequential character of the speech signal and are statistically trained. However, the a-priori choice of the model topology limits their flexibility. Another drawback of these models is their weak discriminating power. Multilayer perceptrons are now(More)
In this paper, we address the problem of the speakerbased segmentation, which is the first necessary step for several indexing tasks. It consists in recognizing from their voice the sequence of people engaged in a conversation. In our context, we make no assumptions about prior knowledge of the speaker characteristics (no speaker model, no speech model, no(More)
Major progress is being recorded regularly on both the technology and exploitation of automatic speech recognition (ASR) and spoken language systems. However, there are still technological barriers to flexible solutions and user satisfaction under some circumstances. This is related to several factors, such as the sensitivity to the environment (background(More)
A speaker tracking system (STS) is built by using successively a speaker change detector and a speaker verication system. The aim of the STS is to nd in a conversation between several persons (some of them having already enrolled and other being totally unknown) target speakers chosen in a set of enrolled users. In a rst step, speech is segmented into(More)
A technique for rapid speaker adaptation, called eigenvoices, was introduced recently. The key idea is to confine models in a very low-dimensional linear vector space. This space summarizes a priori knowledge that we have about speaker models. In many practical systems, however, there is a mismatch between the conditions in which the training data were(More)
It is well known that the peaks in log Mel-filter bank spectrum are important cues in characterizing the speech sounds. However, low energy perturbations in the power spectrum may become numerically significant after the log compression. We show that even if the spectral peaks are kept constant, the low energy perturbations in the power spectrum can create(More)
This paper addresses the problem of speaker-based segmentation. The aim is to segment the audio data with respect to the speakers. In our study, we assume that no prior information on speakers is available and that people do not speak simultaneously. Our segmentation technique is operated in two passes: first, the most likely speaker changes are detected(More)