Learn More
In this paper, we address the problem of speaker-based segmentation, which is the ®rst necessary step for several indexing tasks. It aims to extract homogeneous segments containing the longest possible utterances produced by a single speaker. In our context, no assumption is made about prior knowledge of the speaker or speech signal characteristics (neither(More)
Hidden Markov models are widely used for automatic speech recognition. They inherently incorporate the sequential character of the speech signal and are statistically trained. However, the a-priori choice of the model topology limits their flexibility. Another drawback of these models is their weak discriminating power. Multilayer perceptrons are now(More)
A s p e a k er tracking system (STS) is built by using successively a speaker change detector and a speaker veri-cation system. The aim of the STS is to nd in a conversation between several persons (some of them having already enrolled and other being totally unknown) target speakers chosen in a set of enrolled users. In a rst step, speech is segmented into(More)
It is often acknowledged that speech signals contain short-term and long-term temporal properties [15] that are difficult to capture and model by using the usual fixed scale (typically 20ms) short time spectral analysis used in hidden Markov models (HMMs), based on piecewise stationarity and state conditional independence assumptions of acoustic vectors.(More)
The content-based indexing task considered in this paper consists in recognizing from their voice, speakers involved in a conversation. A new approach for speaker-based seg-mentation, which is the first necessary step for this indexing task, is described. Our study is done under the assumptions that no prior information on speakers is available, that the(More)
Major progress is being recorded regularly on both the technology and exploitation of automatic speech recognition (ASR) and spoken language systems. However, there are still technological barriers to flexible solutions and user satisfaction under some circumstances. This is related to several factors, such as the sensitivity to the environment (background(More)
– A technique for rapid speaker adaptation, called eigenvoices, was introduced recently. The key idea is to confine models in a very low-dimensional linear vector space. This space summarizes a priori knowledge that we have about speaker models. In many practical systems, however, there is a mismatch between the conditions in which the training data were(More)