• Publications
  • Influence
Merlin: An Open Source Neural Network Speech Synthesis System
TLDR
We introduce the Merlin speech synthesis toolkit for neural network-based speech synthesis. Expand
  • 243
  • 34
  • PDF
ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge
TLDR
This paper introduces the first edition of the ASVspoof challenge, summaries the results and discusses directions for future challenges and research. Expand
  • 236
  • 27
  • PDF
Spoofing and countermeasures for speaker verification: A survey
TLDR
We present a survey of past work and identify priority research directions for the future. Expand
  • 371
  • 18
  • PDF
Deep neural networks employing Multi-Task Learning and stacked bottleneck features for speech synthesis
TLDR
We show that the hidden representation used within a DNN can be improved through the use of Multi-Task Learning, and that stacking multiple frames of hidden layer activations (stacked bottleneck features) also leads to improvements. Expand
  • 229
  • 14
  • PDF
Efficient architecture for soft-output massive MIMO detection with Gauss-Seidel method
TLDR
In massive multiple-input multiple-output (MIMO) uplink, an efficient architecture for soft-output detection is proposed in this paper. Expand
  • 66
  • 14
Detecting Converted Speech and Natural Speech for anti-Spoofing Attack in Speaker Recognition
TLDR
We propose to use features derived from phase spectrum to detect converted speech for spoofing attack to enhance the security of speaker verification system. Expand
  • 143
  • 12
  • PDF
A study of speaker adaptation for DNN-based speech synthesis
TLDR
A major advantage of statistical parametric speech synthesis over unit-selection speech synthesis is its adaptability and controllability in changing speaker characteristics and speaking style. Expand
  • 105
  • 11
  • PDF
Synthetic speech detection using temporal modulation feature
TLDR
We propose to use modulation features derived from magnitude/phase spectrum to detect synthetic speech from human speech. Expand
  • 99
  • 9
  • PDF
Investigating gated recurrent networks for speech synthesis
TLDR
We present a visual analysis alongside a series of experiments, resulting in a proposal for a simplified architecture that only uses the critical forget gate. Expand
  • 107
  • 9
  • PDF
Automatic prosody prediction and detection with Conditional Random Field (CRF) models
TLDR
We investigate how to improve TTS prosody prediction and detection. Expand
  • 37
  • 8
  • PDF