Olivier Siohan

Learn More
Transformation-based model adaptation techniques like maximum likelihood linear regression (MLLR) rely on an accurate selection of the number of transformations for a given amount of adaptation data. If too many transformations are used, the transformation parameters may be poorly estimated, can overfit the adaptation data, and offer poor generalization. On(More)
In the past few years, transformation-based model adaptation techniques have been widely used to help reducing acoustic mismatch between training and testing conditions of automatic speech recognizers. The estimation of the transformation parameters is usually carried out using estimation paradigms based on classical statistics such as maximum likelihood,(More)
We are interested in retrieving information from speech data like broadcast news, telephone conversations and roundtable meetings. Today, most systems use large vocabulary continuous speech recognition tools to produce word transcripts; the transcripts are indexed and query terms are retrieved from the index. However, query terms that are not part of the(More)
We propose a Minimum Verification Error (MVE) training scenario to design and adapt an HMM-based speaker verification system. By using the discriminative training paradigm, we show that customer and background models can be jointly estimated so that the expected number of verification errors (false accept and false reject) on the training corpus are(More)
Classical audio retrieval techniques consist in transcribing audio documents using a large vocabulary speech recognition system and indexing the resulting transcripts. However, queries that are not part of the recognizer’s vocabulary or have a large probability of getting misrecognized can significantly impair the performance of the retrieval system.(More)
Building multiple automatic speech recognition (ASR) systems and combining their outputs using voting techniques such as ROVER is an effective technique for lowering the overall word error rate. A successful system combination approach requires the construction of multiple systems with complementary errors, or the combination will not outperform any of the(More)
TC-STAR is an European Union funded speech to speech translation project to transcribe, translate and synthesize European Parliamentary Plenary Speeches (EPPS). This paper describes IBM’s English and Spanish speech recognition systems submitted to the TC-STAR 2006 Evaluation. The technical advances in this submission include two different algorithms for(More)
An auditory feature extraction algorithm for robust speech recognition in adverse acoustic environments is proposed. Based on the analysis of human auditory system, the feature extraction algorithm consists of several modules: FFT, outer-middle-ear transfer function, frequency conversion from linear to Bark scales, auditory filtering, nonlinearity, and(More)