• Publications
  • Influence
ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech
TLDR
It was demonstrated that the spoofing data in the ASVspoof 2019 database have varied degrees of perceived quality and similarity to the target speakers, including spoofed data that cannot be differentiated from bona-fide utterances even by human subjects.
A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis
TLDR
This paper builds a framework in which new vocoding and acoustic modeling techniques with conventional approaches are compared by means of a large scale crowdsourced evaluation, and shows that generative adversarial networks and an autoregressive (AR) model performed better than a normal recurrent network and the AR model performed best.
GlottDNN - A Full-Band Glottal Vocoder for Statistical Parametric Speech Synthesis
TLDR
The proposed GlottDNN vocoder was evaluated as part of a full-band state-of-the-art DNN-based text-to-speech (TTS) synthesis system and compared against the release version of the original GlottHMM vocoder, and the well-known STRAIGHT vocoder.
Non-parallel voice conversion using i-vector PLDA: towards unifying speaker verification and transformation
TLDR
This work adopts probabilistic linear discriminant analysis (PLDA) for voice conversion and adopts i-vector method to voice conversion, which requires neither parallel utterances, transcriptions nor time alignment procedures at any stage.
Speaker-independent raw waveform model for glottal excitation
TLDR
A multi-speaker 'GlotNet' vocoder, which utilizes a WaveNet to generate glottal excitation waveforms, which are then used to excite the corresponding vocal tract filter to produce speech.
A Comparison Between STRAIGHT, Glottal, and Sinusoidal Vocoding in Statistical Parametric Speech Synthesis
TLDR
The obtained results suggest that the choice of the voice has a profound impact on the overall quality of the vocoder-generated speech, and the best vocoder for each voice can vary case by case, indicating that the waveform generation method of a vocoder is essential for quality improvements.
Speaking Style Conversion from Normal to Lombard Speech Using a Glottal Vocoder and Bayesian GMMs
TLDR
A parametric approach that uses a vocoder to extract speech features from utterances spoken in normal style to the corresponding features of Lombard speech, and shows that the system is able to convert normal speech into Lombardspeech for the two vocoders.
Deep Learning for Tube Amplifier Emulation
TLDR
This work proposes a generic data-driven approach to virtual analog modeling and applies it to the Fender Bassman 56F-A vacuum-tube amplifier, and faithfully restitutes the range of sonic characteristics found across the configurations of the original device.
Phase perception of the glottal excitation and its relevance in statistical parametric speech synthesis
TLDR
The results indicate that the phase spectrum of the voiced excitation has a perceptually relevant effect in natural, vocoded, and synthetic speech, and utilizing the phase information in speech synthesis leads to improved speech quality.
GlotNet—A Raw Waveform Model for the Glottal Excitation in Statistical Parametric Speech Synthesis
TLDR
This study presents a raw waveform glottal excitation model, called GlotNet, and compares its performance with the corresponding direct speech waveform model, WaveNet, using equivalent architectures.
...
1
2
3
4
5
...