Speaker-independent raw waveform model for glottal excitation
@inproceedings{Juvela2018SpeakerindependentRW, title={Speaker-independent raw waveform model for glottal excitation}, author={Lauri Juvela and Vassilis Tsiaras and Bajibabu Bollepalli and Manu Airaksinen and Junichi Yamagishi and Paavo Alku}, booktitle={INTERSPEECH}, year={2018} }
Recent speech technology research has seen a growing interest in using WaveNets as statistical vocoders, i.e., generating speech waveforms from acoustic features. These models have been shown to improve the generated speech quality over classical vocoders in many tasks, such as text-to-speech synthesis and voice conversion. Furthermore, conditioning WaveNets with acoustic features allows sharing the waveform generator model across multiple speakers without additional speaker codes. However…
36 Citations
- A Raw Waveform Model for the Glottal Excitation in Statistical Parametric Speech Synthesis
- Computer Science
- 2019
This study presents a raw waveform glottal excitation model, called GlotNet, and compares its performance with the corresponding direct speech waveform model, WaveNet, using equivalent architectures.
GlotNet—A Raw Waveform Model for the Glottal Excitation in Statistical Parametric Speech Synthesis
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2019
This study presents a raw waveform glottal excitation model, called GlotNet, and compares its performance with the corresponding direct speech waveform model, WaveNet, using equivalent architectures.
Speaker-Adaptive Neural Vocoders for Parametric Speech Synthesis Systems
- Computer Science2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)
- 2020
Experimental results verify that the proposed TTS systems with speaker-adaptive neural vocoders outperform those with traditional source-filter model-based vocoder and those with WaveNet vocodering models, trained either speaker-dependently or speaker-independently.
ExcitNet Vocoder: A Neural Excitation Model for Parametric Speech Synthesis Systems
- Computer Science2019 27th European Signal Processing Conference (EUSIPCO)
- 2019
Experimental results show that the proposed ExcitNet vocoder, trained both speaker-dependently and speaker-independently, outperforms traditional linear prediction vocoders and similarly configured conventional WaveNet Vocoders.
Neural Text-to-Speech with a Modeling-by-Generation Excitation Vocoder
- Computer ScienceINTERSPEECH
- 2020
A modeling-by-generation (MbG) excitation vocoder for a neural text-to-speech (TTS) system that provides high-quality synthetic speech by achieving a mean opinion score of 4.57 within the TTS framework is proposed.
Speaker-adaptive neural vocoders for statistical parametric speech synthesis systems
- Computer ScienceArXiv
- 2018
Experimental results verify that the proposed SPSS systems with speaker-adaptive neural vocoders outperform those with traditional source-filter model-based vocodery and those with WaveNet vocoder, trained either speaker-dependently or speaker-independently.
Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis
- Computer Science, GeologyIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2020
It was demonstrated that the NSF models generated waveforms at least 100 times faster than the authors' WaveNet-vocoder, and the quality of the synthetic speech from the best NSF model was comparable to that from WaveNet on a large single-speaker Japanese speech corpus.
Online Speaker Adaptation for WaveNet-based Neural Vocoders
- Physics, Computer Science2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
- 2020
Experimental results demonstrate that the proposed online speaker adaptation method can achieve a better objective and subjective performance on reconstructing waveforms of unseen speakers than the conventional speaker-independent WaveNet vocoder.
LP-WaveNet: Linear Prediction-based WaveNet Speech Synthesis
- Computer Science2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
- 2020
An LP-WaveNet vocoder, where the complicated interactions between vocal source and vocal tract components are jointly trained within a mixture density networkbased WaveNet model, which outperforms the conventional WaveNet vocoders both objectively and subjectively.
Non-Parallel Voice Conversion System With WaveNet Vocoder and Collapsed Speech Suppression
- Computer ScienceIEEE Access
- 2020
This paper integrates a simple non-parallel voice conversion (VC) system with a WaveNet (WN) vocoder and a proposed collapsed speech suppression technique and proposes a collapsed speech segment detector (CSSD) to mitigate the negative effects introduced by the LPCDC.
References
SHOWING 1-10 OF 28 REFERENCES
An investigation of multi-speaker training for wavenet vocoder
- Physics2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
- 2017
The experimental results demonstrate that 1) the multispeaker WaveNet vocoder still outperforms STRAIGHT in generating known speakers' voices but it is comparable to STRAight in generating unknown speaker's voices, and 2) the multi-speaker training is effective for developing the Wave net vocoder capable of speech modification.
Speaker-Dependent WaveNet Vocoder
- Computer ScienceINTERSPEECH
- 2017
A speaker-dependent WaveNet vocoder is proposed, a method of synthesizing speech waveforms with WaveNet, by utilizing acoustic features from existing vocoder as auxiliary features of WaveNet.
A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis
- Physics2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2018
This paper builds a framework in which new vocoding and acoustic modeling techniques with conventional approaches are compared by means of a large scale crowdsourced evaluation, and shows that generative adversarial networks and an autoregressive (AR) model performed better than a normal recurrent network and the AR model performed best.
High-pitched excitation generation for glottal vocoding in statistical parametric speech synthesis using a deep neural network
- Computer Science2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2016
Subjective listening tests conducted on an US English female voice show that the proposed QCP-DNN method gives significant improvement in synthetic naturalness compared to the two previously developed glottal vocoders.
HMM-Based Speech Synthesis Utilizing Glottal Inverse Filtering
- EngineeringIEEE Transactions on Audio, Speech, and Language Processing
- 2011
An hidden Markov model (HMM)-based speech synthesizer that utilizes glottal inverse filtering for generating natural sounding synthetic speech and the quality is clearly better compared to two HMM-based speech synthesis systems based on widely used vocoder techniques.
Generative Adversarial Network-Based Glottal Waveform Model for Statistical Parametric Speech Synthesis
- Computer ScienceINTERSPEECH
- 2017
A new method for predicting glottal waveforms by generative adversarial networks (GANs) is proposed, and the newly proposed GANs achieve synthesis quality comparable to that of widely-used DNNs, without using an additive noise component.
WaveNet: A Generative Model for Raw Audio
- Computer ScienceSSW
- 2016
WaveNet, a deep neural network for generating raw audio waveforms, is introduced; it is shown that it can be efficiently trained on data with tens of thousands of samples per second of audio, and can be employed as a discriminative model, returning promising results for phoneme recognition.
Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions
- Computer Science2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2018
This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text. The system is composed of a recurrent sequence-to-sequence feature prediction network that maps…
Wavenet Based Low Rate Speech Coding
- Computer Science2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2018
This work describes how a WaveNet generative speech model can be used to generate high quality speech from the bit stream of a standard parametric coder operating at 2.4 kb/s and shows that the speech produced by the system is able to additionally perform implicit bandwidth extension and does not significantly impair recognition of the original speaker for the human listener.
ON the Use of Wavenet as a Statistical Vocoder
- Computer Science2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2018
This paper used two female and two male speakers from the CMU-ARCTIC database to contrast the use of cepstrum coefficients and filter-bank features as local conditioners with the goal to improve the overall quality for both male and female speakers.