Hao-Teng Fan

  • Citations Per Year
Learn More
The conventional NMF-based speech enhancement algorithm analyzes the magnitude spectrograms of both clean speech and noise in the training data via NMF and estimates a set of spectral basis vectors. These basis vectors are used to span a space to approximate the magnitude spectrogram of the noise-corrupted testing utterances. Finally, the components(More)
This letter proposes a novel scheme that applies feature statistics normalization techniques for robust speech recognition. In the proposed approach, the processed temporal-domain feature sequence is first decomposed into nonuniform subbands using the discrete wavelet transform (DWT), and then each subband stream is individually processed by well-known(More)
In this paper, we explore the various properties of cepstral time coefficients (CTC) in speech recognition, and then propose several methods to refine the CTC construction process. It is found that CTC are the filtered version of mel-frequency cepstral coefficients (MFCC), and the used filters are from the discrete cosine transform (DCT) matrix. We modify(More)
In this article, we present an effective compensation scheme to improve noise robustness for the spectra of speech signals. In this compensation scheme, called magnitude spectrum enhancement (MSE), a voice activity detection (VAD) process is performed on the frame sequence of the utterance. The magnitude spectra of non-speech frames are then reduced while(More)
In this paper, we present a novel approach to enhancing the speech features in the modulation spectrum for better recognition performance in noise-corrupted environments. In the presented approach, termed modulation spectrum power-law expansion (MSPLE), the speech feature temporal stream is first pre-processed by some statistics compensation technique, such(More)
This paper proposes a novel scheme that enhance the modulation spectrum of speech features in noise speech recognition via non-negative matrix factorization (NMF). In the presented approach, we apply NMF to obtain a set of non-negative basis spectra vectors which derived from the clean speech to represent the important components for speech recognition. The(More)
The environmental mismatch caused by additiv e noise and/or channel distortion often degrades th e perform ance of a s peech reco gnition sys tem seriously . V arious ro bustness techniques have been proposed to reduce this mismatch, and one category of them aim s to normalize the statistics of speech fea tures in both training and testing conditions. In(More)
近幾十年來,無數的學者先進對於此雜訊干擾問題提出了豐富眾多的演算法,略分成兩 大類別:強健性語音特徵參數表示法(robust speech feature representation)與語音模型調適 法(speech model adaptation),第一類別之方法主要目的在抽取不易受到外在環境干擾下 而失真的語音特徵參數,或從原始語音特徵中儘量削減雜訊造成的效應,比較知名的方 法有:倒頻譜平均值與變異數正規化法 (cepstral mean and variance normalization, CMVN)[1]、倒頻譜統計圖正規化法(cepstral histogram normalization, CHN)[2]、倒頻譜平 均值與變異數正規化結合自動回歸動態平均濾波器法(cepstral mean(More)