DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis

@article{Saito2019DNNbasedSE,
  title={DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis},
  author={Y. Saito and Shinnosuke Takamichi and H. Saruwatari},
  journal={ArXiv},
  year={2019},
  volume={abs/1907.08294}
}
This paper proposes novel algorithms for speaker embedding using subjective inter-speaker similarity based on deep neural networks (DNNs). Although conventional DNN-based speaker embedding such as a $d$-vector can be applied to multi-speaker modeling in speech synthesis, it does not correlate with the subjective inter-speaker similarity and is not necessarily appropriate speaker representation for open speakers whose speech utterances are not included in the training data. We propose two… Expand
7 Citations
Perceptual-Similarity-Aware Deep Speaker Representation Learning for Multi-Speaker Generative Modeling
  • PDF
Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes
  • PDF
JVS-MuSiC: Japanese multispeaker singing-voice corpus
  • 1
  • PDF

References

SHOWING 1-10 OF 30 REFERENCES
DNN-Based Speech Synthesis Using Speaker Codes
  • 28
Deep neural networks for small footprint text-dependent speaker verification
  • 570
  • PDF
Adapting and controlling DNN-based speech synthesis using input codes
  • 54
  • PDF
Phonetic posteriorgrams for many-to-one voice conversion without parallel data training
  • 151
  • PDF
Non-Parallel Voice Conversion Using Variational Autoencoders Conditioned by Phonetic Posteriorgrams and D-Vectors
  • 65
  • PDF
Text-to-Speech Synthesis Using STFT Spectra Based on Low-/Multi-Resolution Generative Adversarial Networks
  • 12
  • PDF
Regression approaches to voice quality controll based on one-to-many eigenvoice conversion
  • 17
  • PDF
Speaker-Dependent WaveNet Vocoder
  • 203
...
1
2
3
...