Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory
  • T. Toda, A. Black, K. Tokuda
  • Computer Science, Mathematics
  • IEEE Transactions on Audio, Speech, and Language…
  • 1 November 2007
Experimental results indicate that the performance of VC can be dramatically improved by the proposed method in view of both speech quality and conversion accuracy for speaker individuality. Expand
Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation
A model for constructing vector representations of words by composing characters using bidirectional LSTMs that requires only a single vector per character type and a fixed set of parameters for the compositional model, which yields state- of-the-art results in language modeling and part-of-speech tagging. Expand
Unit selection in a concatenative speech synthesis system using a large speech database
It is proposed that the units in a synthesis database can be considered as a state transition network in which the state occupancy cost is the distance between a database unit and a target, and the transition cost is an estimate of the quality of concatenation of two consecutive units. Expand
The HMM-based speech synthesis system (HTS) version 2.0
This paper describes HTS version 2.0 in detail, as well as future release plans, which include a number of new features which are useful for both speech synthesis researchers and developers. Expand
Statistical Parametric Speech Synthesis
This paper gives a general overview of techniques in statistical parametric speech synthesis, and contrasts these techniques with the more conventional unit selection technology that has dominated speech synthesis over the last ten years. Expand
Festival Speech Synthesis System
Style Transfer Through Back-Translation
A latent representation of the input sentence is learned which is grounded in a language translation model in order to better preserve the meaning of the sentence while reducing stylistic properties, and adversarial generation techniques are used to make the output match the desired style. Expand
The CMU Arctic speech databases
The CMU Arctic databases designed for the purpose of speech synthesis research, which consist of approximately 1200 phonetically balanced English utterances, are distributed as free software, without restriction on commercial or non-commercial use. Expand
Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model
Experimental results demonstrate that the MLE- based mapping with dynamic features can significantly improve the mapping performance compared with the MMSE-based mapping in both the articulatory-to-acoustic mapping and the inversion mapping. Expand