Learn More
Every speech recognition system requires a signal representation that parametrically models the temporal evolution of the speech spectral envelope. Current parameterizations involve, either explicitly or implicitly, a set of energies from frequency bands which are often distributed in a mel scale. The computation of those energies is performed in diverse(More)
Cepstral coefficients are widely used in speech recognition. In this paper, we claim that they are not the best way of representing the spectral envelope, at least for some usual speech recognition systems. In fact, cepstrum has several disadvantages: poor physical meaning, need of transformation, and low capacity of adaptation to some recognition systems.(More)
In this paper, we present the results of the Acoustic Event Detection (AED) and Classification (AEC) evaluations carried out in February 2006 by the three participant partners from the CHIL project. The primary evaluation task was AED of the testing portions of the isolated sound databases and seminar recordings produced in CHIL. Additionally, a secondary(More)
small. Although increased training data may be helpful, in the speaker dependent Mandarin syllable recognition problem, a limited database will probably still be a normal situation for some period of time in the future. VII. CONCLUSION A new approach is proposed in this correspondence to obtain more elaborate initial models covering characteristics of(More)
Acoustic event detection (AED) aims at determining the identity of sounds and their temporal position in the signals that are captured by one or several microphones. The AED problem has been recently proposed for meeting-room or classroom environments, where a specific set of meaningful sounds has been defined, and several evaluations have been carried out(More)
The performance of ASR systems in a room environment with distant microphones is strongly affected by reverberation. As the degree of signal distortion varies among acoustic channels (i.e. microphones), the recognition accuracy can benefit from a proper channel selection. In this paper, we experimentally show that there exists a large margin for WER(More)
In this paper we study various decorrelation methods for the features used in speech recognition and we compare the performance of each one by running several tests with a speech database. First of all we study the Principal Components Analysis PCA. PCA extracts the dimensions along which the data vary the most, and thus it allows us to reduce the dimension(More)