Hsin-Te Hwang

  • Citations Per Year
Learn More
The Gaussian mixture model (GMM)-based method has dominated the field of voice conversion (VC) for last decade. However, the converted spectra are excessively smoothed and thus produce muffled converted sound. In this study, we improve the speech quality by enhancing the dependency between the source (natural sound) and converted feature vectors (converted(More)
Maximum likelihood-based trajectory mapping considering global variance (MLGV-based trajectory mapping) has been proposed for improving the quality of the converted speech of Gaussian mixture model-based voice conversion (GMM-based VC). Although the quality of the converted speech is significantly improved, the computational cost of the online conversion(More)
In this paper, we propose a maximum mutual information (MMI) training criterion to refine the parameters of the joint density GMM (JDGMM) set to tackle the over-smoothing issue in voice conversion (VC). Conventionally, the maximum likelihood (ML) criterion is used to train a JDGMM set, which characterizes the joint property of the source and target feature(More)
This paper investigates the use of the speech parameter generation (SPG) algorithm, which has been successfully adopted in deep neural network (DNN)-based voice conversion (VC) and speech synthesis (SS), for incorporating temporal information to improve the deep denoising auto-encoder (DDAE)-based speech enhancement. In our previous studies, we have(More)
In this paper, we propose a discriminative training (DT) method to alleviate the muffled sound effect caused by over smoothing in the Gaussian mixture model (GMM)-based voice conversion (VC). For the conventional GMM-based VC, we often observed a large degree of ambiguities among acoustic classes (generative classes), determined by the source feature(More)
Voice conversion (VC) using artificial neural networks (ANNs) has shown its capability to produce better sound quality of the converted speech than that using Gaussian mixture model (GMM). Although ANN-based VC works reasonably well, there is still room for further improvement. One of the promising ways is to adopt the successful techniques in statistical(More)
We propose a flexible framework for spectral conversion (SC) that facilitates training with unaligned corpora. Many SC frameworks require parallel corpora, phonetic alignments, or explicit frame-wise correspondence for learning conversion functions or for synthesizing a target spectrum with the aid of alignments. However, these requirements gravely limit(More)
This paper describes a novel exemplar-based spectral conversion (SC) system developed by the AST (Academia Sinica, Taipei) team for the 2016 voice conversion challenge (vcc2016). The key feature of our system is that it integrates the locally linear embedding (LLE) algorithm, a manifold learning algorithm that has been successfully applied for the(More)
Building a voice conversion (VC) system from non-parallel speech corpora is challenging but highly valuable in real application scenarios. In most situations, the source and the target speakers do not repeat the same texts or they may even speak different languages. In this case, one possible, although indirect, solution is to build a generative model for(More)