Model based distribution equalization applied on spectro-temporal speech features


Speech recognition scores of machine decrease significantly in comparison to humans in difficult environments [1], e.g. when the noise exhibits nonstationary characteristics. Thus, standard speech features as the Mel Frequency Cepstral Coefficients (MFCCs) or RelAtive SpectrAl (RASTA) features [2] show good performance in clean conditions but strongly deteriorate in the presence of noise. However, spectro-temporal features achieved promising results in such situations [3, 4]. Unlike standard features, they are able to detect for instance steady formant transitions in the spectro-temporal representation. Most of them use Gabor filters [5], whereas we developed features inspired by a hierarchical system for visual object recognition [6] . We refer to them as Hierarchical Spectro-Temporal (HIST) features with their extraction scheme depicted in Fig. 1 [4].

3 Figures and Tables

Cite this paper

@inproceedings{Ngouoko2013ModelBD, title={Model based distribution equalization applied on spectro-temporal speech features}, author={Samuel K. Ngouoko and Martin Heckmann and Britta Wrede}, year={2013} }