Learn More
The conventional NMF-based speech enhancement algorithm analyzes the magnitude spectrograms of both clean speech and noise in the training data via NMF and estimates a set of spectral basis vectors. These basis vectors are used to span a space to approximate the magnitude spectrogram of the noise-corrupted testing utterances. Finally, the components(More)
This letter proposes a novel scheme that applies feature statistics normalization techniques for robust speech recognition. In the proposed approach, the processed temporal-domain feature sequence is first decomposed into nonuniform subbands using the discrete wavelet transform (DWT), and then each subband stream is individually processed by well-known(More)
In this paper, we explore the various properties of cepstral time coefficients (CTC) in speech recognition, and then propose several methods to refine the CTC construction process. It is found that CTC are the filtered version of mel-frequency cepstral coefficients (MFCC), and the used filters are from the discrete cosine transform (DCT) matrix. We modify(More)
This paper presents a novel noise robustness method, nonnegative matrix factorization-based noise suppression (NNS), to enhance the magnitude spectrum of speech signals for better speech recognition performance in noise-corrupted environments. In the presented approach, the clean data and noise in the training set are firstly converted to the spectrograms(More)
摘要 在本論文裡,我們提出了一種藉由線性估測編碼來強化語音辨識中特徵之抗噪 性的新方法,在此方法中,根據線性估測編碼技術,將語音倒頻譜特徵時間序 列分解出估測誤差成分後,將此估測誤差成分從原特徵序列扣除,所得的新特 徵序列,相對於原始特徵序列而言,發現具有更佳的雜訊強健性,在 Aurora-2 此包含各類雜訊之數字語料庫的實驗環境下,經過各種預強健化處理之倒頻譜 語音特徵,再進一步藉由我們所提之新方法處理後,都能得到更佳的辨識效能, 且在線性估測階數很低的情況下,就可有效提升辨識率,顯示了我們可以高效 率地執行實現所提之新技術。 關鍵詞:線性估測編碼、特徵時間序列、雜訊強健性。 Abstract In this paper, we present a novel method to extract(More)
In this paper, we present a novel approach to enhancing the speech features in the modulation spectrum for better recognition performance in noise-corrupted environments. In the presented approach, termed modulation spectrum power-law expansion (MSPLE), the speech feature temporal stream is first pre-processed by some statistics compensation technique, such(More)