Climent Nadeu

Learn More
In this paper, we present the results of the Acoustic Event Detection (AED) and Classification (AEC) evaluations carried out in February 2006 by the three participant partners from the CHIL project. The primary evaluation task was AED of the testing portions of the isolated sound databases and seminar recordings produced in CHIL. Additionally, a secondary(More)
Acoustic event detection (AED) aims at determining the identity of sounds and their temporal position in the signals that are captured by one or several microphones. The AED problem has been recently proposed for meeting-room or class-room environments, where a specific set of meaningful sounds has been defined, and several evaluations have been carried out(More)
Every speech recognition system requires a signal representation that parametrically models the temporal evolution of the speech spectral envelope. Current parameterizations involve, either explicitly or implicitly, a set of energies from frequency bands which are often distributed in a mel scale. The computation of those energies is performed in diverse(More)
Cepstral coefficients are widely used in speech recognition. In this paper, we claim that they are not the best way of representing the spectral envelope, at least for some usual speech recognition systems. In fact, cepstrum has several disadvantages: poor physical meaning, need of transformation, and low capacity of adaptation to some recognition systems.(More)
The aim of this correspondence is to present a robust representation of speech, that is based on an AR modeling of the causal part of the autocorrelation sequence. Its performance in noisy speech recognition is compared with several related techniques, showing that it achieves better results for severe noise conditions. EDICS Categories SA 1.6.8, SA 1.6.1,(More)
Acoustic events produced in controlled environments may carry information useful for perceptually aware interfaces. In this paper we focus on the problem of classifying 16 types of meeting-room acoustic events. First of all, we have defined the events and gathered a sound database. Then, several classifiers based on support vector machines (SVM) are(More)
This work aims at gaining an insight into the mean and variance normalization technique (MVN), which is commonly used to increase the robustness of speech recognition features. Several versions of MVN are empirically investigated, and the factors affecting their performance are considered. The reported experimental work with real-world speech data (Speecon)(More)
In automatic speech recognition, the signal is usually represented by a set of time sequences of spectral parameters Ž . TSSPs that model the temporal evolution of the spectral envelope frame-to-frame. Those sequences are then filtered either Ž . to make them more robust to environmental conditions or to compute differential parameters dynamic features(More)