• Corpus ID: 18233254

On the decorrelation of filter-bank energies in speech recognition

@inproceedings{Nadeu1995OnTD,
  title={On the decorrelation of filter-bank energies in speech recognition},
  author={Climent Nadeu and Javier Hernando and M{\'o}nica Gorricho},
  booktitle={EUROSPEECH},
  year={1995}
}
Cepstral coefficients are widely used in speech recognition. In this paper, we claim that they are not the best way of representing the spectral envelope, at least for some usual speech recognition systems. In fact, cepstrum has several disadvantages: poor physical meaning, need of transformation, and low capacity of adaptation to some recognition systems. In this paper, we propose a new representation that significantly outperforms both mel-cepstrum and LPC-cepstrum techniques in both… 

Figures and Tables from this paper

On the use of filter-bank energies as features for robust speech recognition
  • K. Paliwal
  • Computer Science
    ISSPA '99. Proceedings of the Fifth International Symposium on Signal Processing and its Applications (IEEE Cat. No.99EX359)
  • 1999
Though mel frequency cepstral coefficients (MFCCs) have been very successful in speech recognition, they have the following two problems: (1) they do not have any physical interpretation, and (2)
FILTER-BANK ENERGIES FOR ROBUST SPEECH RECOGNITION
TLDR
The FBEs are physically meaningful quantities and amenable for applying human auditory processing such as masking and perform at least as good as (and sometimes even better than) the MFCCs for robust speech recognition.
Frequency and Wavelet Filtering for Robust Speech Recognition
TLDR
Frequency filtering is put in another perspective: the wavelet transform to explain the discrepancies and to achieve significant improvements in recognition in the highly mismatch case.
Decorrelated and liftered filter-bank energies for robust speech recognition
TLDR
The FBEs are physically meaningful quantities and amenable for applying human auditory processing such as masking and perform at least as good as (and sometimes even better than) the MFCCs for robust speech recognition.
A fuzzy approach for the equalization of cepstral variances
  • W.-W. Hung, Hsiao-Chuan Wang
  • Engineering
    2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)
  • 2000
TLDR
Experimental results for recognition of continuous Mandarin telephone speech showed that the proposed fuzzy filter bank analysis (FFBA) offers a significant improvement in the discrimination capability of cepstral features while maintaining a low computation cost.
Speaker verification on the polycost database using frequency filtered spectral energies
TLDR
The hybridization of both linear prediction and filter-bank spectral analysis using either cepstral transformation or the alternative frequency filtering is explored for speaker verification, which has yield good text-dependent speaker verification results on the new speakeroriented telephone-line POLYCOST database.
Filtering of Filter‐Bank Energies for Robust Speech Recognition
TLDR
A filtering method in log‐spectral domain corresponding to the cepstral liftering effect is derived and it is shown that in noisy speech recognition, the proposed method reduces the error rate by 52.7% to conventional feature.
Speech recognition using filter-bank features
TLDR
The author presents features derived from filter bank outputs whose performance is comparable to that of MFCCs for connected digit recognition using a hidden Markov model (HMM) based speech recognition system.
Speaker recognition using frequency filtered spectral energies
TLDR
The combination of hybrid spectral analysis and frequency filtering, that had shown to be able to outperform the conventional techniques in clean and noisy word recognition, has yield good text-dependent speaker verification results on the new speaker-oriented telephone-line POLYCOST database.
RECOGNITION USING FREQUENCY FILTERED SPECTRAL ENERGIES
The spectral parameters that result from filtering the frequency sequence of log mel-scaled filter-bank energies with a simple first or second order FIR filter have proved to be an efficient speech
...
...

References

SHOWING 1-7 OF 7 REFERENCES
On the use of bandpass liftering in speech recognition
TLDR
This paper has found that a bandpass "liftering" process reduces the variability of the statistical components of LPC-based spectral measurements and hence it is desirable to use such a liftering process in a speech recognizer.
Signal modeling techniques in speech recognition
TLDR
A tutorial on signal processing in state-of-the-art speech recognition systems is presented, reviewing those techniques most commonly used, and three important trends that have developed in the last five years in speech recognition are examined.
Spectral slope based distortion measures for all-pole models of speech
  • B. Hanson, H. Wakita
  • Physics
    ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing
  • 1986
TLDR
Initial testing of spectral slope measures derived from the spectra of all-pole models of speech for speaker-dependent isolated word recognition indicates that they are quite robust, giving considerable performance improvement over the standard cepstral distance measure in several noisy speech situations.
A weighted cepstral distance measure for speech recognition
  • Y. Tohkura
  • Computer Science
    IEEE Trans. Acoust. Speech Signal Process.
  • 1987
TLDR
The experimental results show that the weighted cepstral distance measure works substantially better than both the Euclidean cepStral distance and the log likelihood ratio distance measures across two different data bases, namely a 10 digits and a 129 airline vocabulary words.
A database for speaker-independent digit recognition
TLDR
A large speech database has been collected for use in designing and evaluating algorithms for speaker independent recognition of connected digit sequences and formal human listening tests on this database provided certification of the labelling of the digit sequences.
Prediction of perceived phonetic distance from critical-band spectra: A first step
Judgements of phonetic distance between pairs of static synthetic vowels and fricatives have been collected in which the stimulus ensemble included formant frequency changes and a number of acoustic
ESPRIT project: Speech Technology Assessment in Multilingual Applications (SAM-A). Document SAM-A/6002
  • ESPRIT project: Speech Technology Assessment in Multilingual Applications (SAM-A). Document SAM-A/6002
  • 1993