Reconstruction of missing features for robust speech recognition

Abstract

Speech recognition systems perform poorly in the presence of corrupting noise. Missing feature methods attempt to compensate for the noise by removing noise corrupted components of spectrographic representations of noisy speech and performing recognition with the remaining reliable components. Conventional classifier-compensation methods modify the recognition system to work with the incomplete representations so obtained. This constrains them to perform recognition using spectrographic features which are known to be less optimal than cepstra. In this paper we present two missing-feature algorithms that reconstruct complete spectrograms from incomplete noisy ones. Cepstral vectors can now be derived from the reconstructed spectrograms for recognition. The first algorithm uses MAP procedures to estimate corrupt components from their correlations with reliable components. The second algorithm clusters spectral vectors of clean speech. Corrupt components of noisy speech are estimated from the distribution of the cluster that the analysis frame is identified with. Experiments show that, although conventional classifier-compensation methods are superior when recognition is performed with spectrographic features, cepstra derived from the reconstructed spectrograms result in better recognition performance overall. The proposed methods are also less expensive computationally and do not require modification of the recognizer. 2004 Elsevier B.V. All rights reserved.

DOI: 10.1016/j.specom.2004.03.007

Extracted Key Phrases

14 Figures and Tables

0102030'05'06'07'08'09'10'11'12'13'14'15'16'17
Citations per Year

252 Citations

Semantic Scholar estimates that this publication has 252 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@article{Raj2004ReconstructionOM, title={Reconstruction of missing features for robust speech recognition}, author={Bhiksha Raj and Michael L. Seltzer and Richard M. Stern}, journal={Speech Communication}, year={2004}, volume={43}, pages={275-296} }