• Publications
  • Influence
A novel two-step SVM classifier for voiced/unvoiced/silence classification of speech
TLDR
A novel method for voiced/unvoiced/silence of speech classification using the support vector machine (SVM) is proposed that can correctly classify speech frames into voiced frame, unvoiced frame and silence frame.
Multiplicative Update of Auto-Regressive Gains for Codebook-Based Speech Enhancement
  • Qi He, Feng Bao, C. Bao
  • Computer Science
    IEEE/ACM Transactions on Audio, Speech, and…
  • 1 March 2017
TLDR
An improved codebook-driven Wiener filter combined with the speech-presence probability is developed, so that the proposed method achieves the goal of removing the residual noise between the harmonics of noisy speech.
Sparse Hidden Markov Models for Speech Enhancement in Non-Stationary Noise Environments
TLDR
The subjective and objective test results indicate that the proposed speech enhancement scheme can achieve a larger segmental SNR improvement, a lower log-spectral distortion and a better speech quality in stationary noise conditions than state-of-the-art reference methods.
Speech enhancement with weighted denoising auto-encoder
TLDR
A novel speech enhancement method with Weighted Denoising Auto-encoder (WDA) is proposed, which could achieve similar amount of noise reduction in both white and colored noise, and the distortion on the level of speech signal is smaller.
Large-Scale Whale-Call Classification by Transfer Learning on Multi-Scale Waveforms and Time-Frequency Features
TLDR
An effective data-driven approach based on pre-trained Convolutional Neural Networks using multi-scale waveforms and time-frequency feature representations is developed in order to perform the classification of whale calls from a large open-source dataset recorded by sensors carried by whales.
High frequency reconstruction of audio signal based on chaotic prediction theory
TLDR
Performance evaluation with objective and subjective tests has shown that the principles of audio signal production and the characteristics of the human hearing system have been used to develop a blind high frequency reconstruction method based on chaotic prediction theory.
Phoneme-Unit-Specific Time-Delay Neural Network for Speaker Verification
TLDR
Phoneme-unit-specific time-delay neural network (PUSTDNN) is proposed and applied to the state-of-the-art x-vector system and shows that the phonetic vector technique is most robust to the phoneme unit recognition accuracy.
Projective non-negative matrix factorization with Bregman divergence for musical instrument classification
  • R. Rui, C. Bao
  • Computer Science
    IEEE International Conference on Signal…
  • 22 October 2012
TLDR
The results indicate that the classification accuracy of the proposed PNMF classifier outperforms the classifiers derived from conventional NMF and machine learning.
...
...