Improving the Front-End Noise Preprocessor of MELPe


In this paper we focus on improving the noise preprocessor (NPP) of the low-rate speech coder MELPe using information from the non-acoustic General Electromagnetic Motion Sensor (GEMS). A generalized linear model approach is proposed to improve the voice activity estimation both in the frame-level time domain and in the bin-level frequency domain with GEMS and context features. HMM based speech recognition techniques are also investigated to drive the estimators. The improved voice activity parameter estimators are shown to have significantly less error than the estimates from MELPe NPP. The improved frame-level voice activity estimator achieves 66% reduction in error. The improved bin-level voice activity estimates has more than 50% error reduction. With an optimal spectral amplitude estimation algorithm instead of the MM-LSA algorithm used in MELPe NPP, and the improved voice activity parameters, the processed noisy speech has much less residue noise and higher intelligibility in informal listening tests.

3 Figures and Tables