Sabine Deligne

Learn More
The multigram model assumes that language can be described as the output of a memoryless source that emits variable-length sequences of words. The estimation of the model parameters can be formulated as a Maximum Likelihood estimation problem from incomplete data. We show that estimates of the model parameters can be computed through an iterative(More)
Accurate speech activity detection is a challenging problem in the car environment where high background noise and high amplitude transient sounds are common. We investigate a number of features that are designed for capturing the harmonic structure of speech. We evaluate separately three important characteristics of these features: 1) discriminative power(More)
The efficiency of pattern recognition algorithms is highly conditioned to a proper definition of the patterns assumed to structure the data. The multigram model provides a statistical tool to retrieve sequential variable-length regularities within streams of data. In this paper, we present a general formulation of the model, applicable to single or multiple(More)
In the eld of speech recognition, the patterns assumed to structure the speech material (phonemes, triphones, words...) are de ned a priori according to a linguistic criterion, whereas the recognition criterion is based on an acoustic similarity measure. From this may result a lack of consistency for the recognition units. In this paper, we explore the(More)
In this paper, we address the problem of blind separation of convolutive mixtures of spatially and temporally independent sources modeled with mixtures of Gaussians. We present an EM algorithm to compute Maximum Likelihood estimates of both the separating filters and the source density parameters, whereas in the state-of-the-art separating filters are(More)
Conventional methods for training statistical models for automatic speech recognition, such as acoustic and language models, have focused on criteria such as maximum likelihood and sentence or word error rate (WER). However, unlike dictation systems, the goal for spoken dialogue systems is to understand the meaning of what a person says, not to get every(More)
Visual speech information present in the speaker’s mouth region has long been viewed as a source for improving the robustness and naturalness of human-computer-interfaces (HCI). Such information can be particularly crucial in realistic HCI environments, where the acoustic channel is corrupted, and as a result, the performance of traditional automatic speech(More)
This paper describes a robust, accurate, efficient, low-resource, medium-vocabulary, grammar-based speech recognition system using Hidden Markov Models for mobile applications. Among the issues and techniques we explore are improving robustness and efficiency of the front-end, using multiple microphones for removing extraneous signals from speech via a new(More)