PS-ZCPA Based Feature Extraction with Auditory Masking, Modulation Enhancement and Noise Reduction for Robust ASR

Abstract

A pitch-synchronous (PS) auditory feature extraction method based on ZCPA (Zero-Crossings Peak-Amplitudes) was proposed previously and showed more robustness over a conventional ZCPA and MFCC based features. In this paper, firstly, a non-linear adaptive threshold adjustment procedure is introduced into the PS-ZCPA method to get optimal results in noisy conditions with different signal-to-noise ratio (SNR). Next, auditory masking, a well-known auditory perception, and modulation enhancement that simulates a strong relationship between modulation spectrums and intelligibility of speech are embedded into the PS-ZCPA method. Finally, a Wiener filter based noise reduction procedure is integrated into the method to make it more noise-robust, and the performance is evaluated against ETSI ES202 (WI008), which is a standard front-end for distributed speech recognition. All the experiments were carried out on Aurora-2J database. The experimental results demonstrated improved performance of the PS-ZCPA method by embedding auditory masking into it, and a slightly improved performance by using modulation enhancement. The PS-ZCPAmethod with Wiener filter based noise reduction also showed better performance than ETSI ES202 (WI008). key words: pitch synchronous analysis, ZCPA, auditory masking, modulation enhancement, Wiener filtering

DOI: 10.1093/ietisy/e89-d.3.1015

Extracted Key Phrases

13 Figures and Tables

Cite this paper

@article{Ghulam2006PSZCPABF, title={PS-ZCPA Based Feature Extraction with Auditory Masking, Modulation Enhancement and Noise Reduction for Robust ASR}, author={Muhammad Ghulam and Takashi Fukuda and Kouichi Katsurada and Junsei Horikawa and Tsuneo Nitta}, journal={IEICE Transactions}, year={2006}, volume={89-D}, pages={1015-1023} }