Learn More
An increasing number of independent studies have confirmed the vulnerability of automatic speaker verification (ASV) technology to spoofing. However, in comparison to that involving other biometric modalities, spoofing and countermeasure research for ASV is still in its infancy. A current barrier to progress is the lack of standards which impedes the(More)
Voice conversion techniques present a threat to speaker verification systems. To enhance the security of speaker verification systems, We study how to automatically distinguish natural speech and synthetic/converted speech. Motivated by the research on phase spectrum in speech perception, in this study, we propose to use features derived from phase spectrum(More)
A robust voice conversion function relies on a large amount of parallel training data, which is difficult to collect in practice. To tackle the sparse parallel training data problem in voice conversion, this paper describes a mixture of factor analyzers method which integrates prior knowledge from non-parallel speech into the training of conversion(More)
Recently, Deep Neural Networks (DNNs) have shown promise as an acoustic model for statistical parametric speech synthesis. Their ability to learn complex mappings from linguistic features to acoustic features has advanced the naturalness of synthesis speech significantly. However, because DNN parameter estimation methods typically attempt to minimise the(More)
Recently, recurrent neural networks (RNNs) as powerful sequence models have re-emerged as a potential acoustic model for statistical parametric speech synthesis (SPSS). The long short-term memory (LSTM) architecture is particularly attractive because it addresses the vanishing gradient problem in standard RNNs, making them easier to train. Although recent(More)
We propose two novel techniques---<i>stacking bottleneck features</i> and <i>minimum generation error (MGE) training criterion</i>---to improve the performance of deep neural network (DNN)-based speech synthesis. The techniques address the related issues of <i>frame-by-frame independence</i> and <i>ignorance of the relationship between static and dynamic(More)
Voice conversion - the methodology of automatically converting one's utterances to sound as if spoken by another speaker - presents a threat for applications relying on speaker verification. We study vulnerability of text-independent speaker verification systems against voice conversion attacks using telephone speech. We implemented a voice conversion(More)
While biometric authentication has advanced significantly in recent years, evidence shows the technology can be susceptible to malicious spoofing attacks. The research community has responded with dedicated countermeasures which aim to detect and deflect such attacks. Even if the literature shows that they can be effective, the problem is far from being(More)
Voice conversion and speaker adaptation techniques present a threat to current state-of-the-art speaker verification systems. To prevent such spoofing attack and enhance the security of speaker verification systems, the development of anti-spoofing techniques to distinguish synthetic and human speech is necessary. In this study, we continue the quest to(More)
The conventional statistical-based transformation functions for voice conversion have been shown to suffer over-smoothing and over-fitting problems. The over-smoothing problem arises because of the statistical average during estimating the model parameters for the transformation function. In addition, the large number of parameters in the statistical model(More)