Learn More
In this paper, we address the problem of speaker-based segmentation, which is the ®rst necessary step for several indexing tasks. It aims to extract homogeneous segments containing the longest possible utterances produced by a single speaker. In our context, no assumption is made about prior knowledge of the speaker or speech signal characteristics (neither(More)
Hidden Markov models are widely used for automatic speech recognition. They inherently incorporate the sequential character of the speech signal and are statistically trained. However, the a-priori choice of the model topology limits their flexibility. Another drawback of these models is their weak discriminating power. Multilayer perceptrons are now(More)
The content-based indexing task considered in this paper consists in recognizing from their voice, speakers involved in a conversation. A new approach for speaker-based seg-mentation, which is the first necessary step for this indexing task, is described. Our study is done under the assumptions that no prior information on speakers is available, that the(More)
– A technique for rapid speaker adaptation, called eigenvoices, was introduced recently. The key idea is to confine models in a very low-dimensional linear vector space. This space summarizes a priori knowledge that we have about speaker models. In many practical systems, however, there is a mismatch between the conditions in which the training data were(More)
It is well known that the peaks in log Mel-filter bank spectrum are important cues in characterizing the speech sounds. However, low energy perturbations in the power spectrum may become numerically significant after the log compression. We show that even if the spectral peaks are kept constant, the low energy perturbations in the power spectrum can create(More)
This paper addresses the problem of speaker-based seg-mentation. The aim is to segment the audio data with respect to the speakers. In our study, we assume that no prior information on speakers is available and that people do not speak simultaneously. Our segmenta-tion technique is operated in two passes: first, the most likely speaker changes are detected(More)
Major progress is being recorded regularly on both the technology and exploitation of automatic speech recognition (ASR) and spoken language systems. However, there are still technological barriers to flexible solutions and user satisfaction under some circumstances. This is related to several factors, such as the sensitivity to the environment (background(More)
A s p e a k er tracking system (STS) is built by using successively a speaker change detector and a speaker veri-cation system. The aim of the STS is to nd in a conversation between several persons (some of them having already enrolled and other being totally unknown) target speakers chosen in a set of enrolled users. In a rst step, speech is segmented into(More)