Statistical Approach for Voice Personality Transformation

  • K.-S. Lee
  • Published 2007 in
    IEEE Transactions on Audio, Speech, and Language…


A voice transformation method which changes the source speaker's utterances so as to sound similar to those of a target speaker is described. Speaker individuality transformation is achieved by altering the LPC cepstrum, average pitch period and average speaking rate. The main objective of the work involves building a nonlinear relationship between the parameters for the acoustical features of two speakers, based on a probabilistic model. The conversion rules involve the probabilistic classification and a cross correlation probability between the acoustic features of the two speakers. The parameters of the conversion rules are estimated by estimating the maximum likelihood of the training data. To obtain transformed speech signals which are perceptually closer to the target speaker's voice, prosody modification is also involved. Prosody modification is achieved by scaling excitation spectrum and time scale modification with appropriate modification factors. An evaluation by objective tests and informal listening tests clearly indicated the effectiveness of the proposed transformation method. We also confirmed that the proposed method leads to smoothly evolving spectral contours over time, which, from a perceptual standpoint, produced results that were superior to conventional vector quantization (VQ)-based methods

DOI: 10.1109/TASL.2006.876760

Extracted Key Phrases

8 Figures and Tables

Cite this paper

@article{Lee2007StatisticalAF, title={Statistical Approach for Voice Personality Transformation}, author={K.-S. Lee}, journal={IEEE Transactions on Audio, Speech, and Language Processing}, year={2007}, volume={15}, pages={641-651} }