Spectral voice conversion for text-to-speech synthesis

  title={Spectral voice conversion for text-to-speech synthesis},
  author={Alexander Kain and Michael W. Macon},
A new voice conversion algorithm that modifies a source speaker’s speech to sound as if produced by a target speaker is presented. It is applied to a residual-excited LPC text-to-speech diphone synthesizer. Spectral parameters are mapped using a locally linear transformation based on Gaussian mixture models whose parameters are trained by joint density estimation. The LPC residuals are adjusted to match the target speaker’s average pitch. To study effects of the amount of training on… CONTINUE READING
Highly Influential
This paper has highly influenced 66 other papers. REVIEW HIGHLY INFLUENTIAL CITATIONS
Highly Cited
This paper has 637 citations. REVIEW CITATIONS


Publications citing this paper.
Showing 1-10 of 424 extracted citations

Voice conversion using conditional restricted Boltzmann machine

2014 IEEE China Summit & International Conference on Signal and Information Processing (ChinaSIP) • 2014
View 10 Excerpts
Highly Influenced

Voice conversion based on a mixture density network

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) • 2017
View 17 Excerpts
Highly Influenced

Voice conversion based on continuous frequency warping and magnitude scaling

2017 28th Irish Signals and Systems Conference (ISSC) • 2017
View 6 Excerpts
Highly Influenced

Augmented speech production based on real-time statistical voice conversion

2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP) • 2014
View 8 Excerpts
Highly Influenced

637 Citations

Citations per Year
Semantic Scholar estimates that this publication has 637 citations based on the available data.

See our FAQ for additional information.


Publications referenced by this paper.
Showing 1-10 of 11 references

OGIresLPC: Diphone synthesizer using residual-excited linear prediction

M. Macon, A. Cronk, J. Wouters, A. Kain
Tech. Rep. CSE-97-007, Department of Computer Science, Oregon Graduate Institute of Science and Technology, Portland, OR, September 1997. • 1997
View 2 Excerpts

The Festival speech synthesis system: System documentation

A. W. Black, P. Taylor
Tech. Rep. HCRC/TR-83, Human Communication Research Centre, University of Edinburgh, Scotland, UK, January 1997. • 1997
View 1 Excerpt

Harmonic plus Noise Models for Speech, combined with Statistical Methods, for Speech and Speaker Modification

Y. Stylianou
Ph.D. thesis, Ecole Nationale Supérieure des Télécommunications, • 1996
View 1 Excerpt

ITU-T Recommendation P.800: Methods for subjective determination of transmission quality

International Telecommunication Union
August 1996. test set 5 set 7 set 9 set ALL ABX1 m/m 47.5% 40.0% 37.5% 52.5% ABX1 m/f 92.5% 95.0% 95.0% 97.5% ABX2 m/m 87.5% 95.8% 91.7% 95.8% ABX2 m/f 100% 100% 100% 100% MOS m/m 3.7 4.0 4.1 4.2 MOS m/f 2.4 2.4 2.1 2.7 Table 2: Results of perceptual tests. The column headers refer to the size of th • 1996
View 1 Excerpt

Local Models and Gaussian Mixture Models for Statistical Data Processing

N. Kambhatla
Ph.D. thesis, Oregon Graduate Institute, • 1996
View 1 Excerpt

Similar Papers

Loading similar papers…