• Publications
  • Influence
TTS synthesis with bidirectional LSTM based recurrent neural networks
Recurrent Neural Networks (RNNs) with Bidirectional Long Short Term Memory (BLSTM) cells are adopted to capture the correlation or co-occurrence information between any two instants in a speech utterance for parametric TTS synthesis.
A Report on the 2017 Native Language Identification Shared Task
The fusion track showed that combining the written and spoken responses provides a large boost in prediction accuracy, and multiple classifier systems were the most effective in all tasks, with most based on traditional classifiers with lexical/syntactic features.
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
A new pre-trained model, WavLM, is proposed, to solve full-stack downstream speech tasks and achieves state-of-the-art performance on the SUPERB benchmark, and brings significant improvements for various speech processing tasks on their representative benchmarks.
On the training aspects of Deep Neural Network (DNN) for parametric TTS synthesis
Experimental results show that DNN can outperform the conventional HMM, which is trained in ML first and then refined by MGE, and both objective and subjective measures indicate thatDNN can synthesize speech better than HMM-based baseline.
Automatic prosody prediction and detection with Conditional Random Field (CRF) models
Experiments performed on Boston University Radio Speech Corpus show that CRF models trained on the proposed rich contextual features can improve the accuracy of prosody prediction and detection in both speaker-dependent and speaker-independent cases.
Multi-speaker modeling and speaker adaptation for DNN-based TTS synthesis
This paper proposes an approach to model multiple speakers TTS with a general DNN, where the same hidden layers are shared among different speakers while the output layers are composed of speaker-dependent nodes explaining the target of each speaker.
Locating Boundaries for Prosodic Constituents in Unrestricted Mandarin Texts
  • Min Chu, Yao Qian
  • Linguistics
    Int. J. Comput. Linguistics Chin. Lang. Process.
  • 1 February 2001
A three-tier prosodic hierarchy for Mandarin that emphasizes the use of the prosodic word instead of the lexical word as the basic prosodic unit is proposed, which shows advantages in detecting the boundaries of intonational phrases at locations without breaking punctuation.
Using bidirectional lstm recurrent neural networks to learn high-level abstractions of sequential features for automated scoring of non-native spontaneous speech
A new method to grade non-native spoken language tests automatically using a type of recurrent neural network to jointly optimize the learning of high level abstractions from time-sequence features with the time-aggregated features.
Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Recurrent Neural Network
This study proposes to use BLSTM-RNN with word embedding for part-of-speech (POS) tagging task and can also achieve a good performance comparable with the Stanford POS tagger.