Learn More
An effective way to increase the noise robustness of automatic speech recognition is to label noisy speech features as either reliable or unreliable (missing), and to replace (impute) the missing ones by clean speech estimates. Conventional imputation techniques employ parametric models and impute the missing features on a frame-by-frame basis. At low(More)
We describe an algorithm to automatically estimate the voice onset time (VOT) of plosives. The VOT is the time delay between the burst onset and the start of periodicity when it is followed by a voiced sound. Since the VOT is affected by factors like place of articulation and voicing it can be used for inference of these factors. The algorithm uses the(More)
Missing data techniques (MDT) have been shown to be an effective method for curing the performance degradation of HMM-based speech recognition systems operating on noisy signals. However, a major drawback of the approach is that MDT requires that the acoustic model be expressed as a mixture of diagonal Gaussians in the log-spectral domain, whereas a higher(More)
We present a technique to automatically discover the (word-sized) phone patterns that are present in speech utterances. These patterns are learnt from a set of phone lattices generated from the utterances. Just like children acquiring language, our system does not have prior information on what the meaningful patterns are. By applying the non-negative(More)
Missing data theory has been applied to the problem of speech recognition in adverse environments. The resulting systems require acoustic models that are expressed in the spectral rather than in the cepstral domain, which leads to loss of accuracy. Cepstral Missing Data Techniques (CMDT) surmount this disadvantage, but require significantly more(More)
We present a novel, exemplar-based method for audio event detection based on non-negative matrix factorisation. Building on recent work in noise robust automatic speech recognition, we model events as a linear combination of dictionary atoms, and mixtures as a linear combination of overlapping events. The weights of activated atoms in an observation serve(More)
In exemplar-based speech enhancement systems, lower dimensional features are preferred over the full-scale DFT features for their reduced computational complexity and the ability to better generalize for the unseen cases. But in order to obtain the Wiener-like filter for noisy DFT enhancement, the speech and noise estimates obtained in the feature space(More)
In this paper, we discuss a computational model of language acquisition which focuses on the detection of words and that is able to detect and build word-like representations on the basis of multimodal input data. Experiments carried out on three European languages (Finnish, Swedish, and Dutch) show that internal word representations can be learned without(More)
Motivated by the success of i-vectors in the field of speaker recognition, this paper proposes a new approach for age estimation from telephone speech patterns based on i-vectors. In this method, each utterance is modeled by its corresponding i-vector. Then, Support Vector Regression (SVR) is applied to estimate the age of speakers. The proposed method is(More)