• Corpus ID: 235446688

Global Rhythm Style Transfer Without Text Transcriptions

  title={Global Rhythm Style Transfer Without Text Transcriptions},
  author={Kaizhi Qian and Yang Zhang and Shiyu Chang and Jinjun Xiong and Chuang Gan and David Cox and Mark A. Hasegawa-Johnson},
Prosody plays an important role in characterizing the style of a speaker or an emotion, but most non-parallel voice or emotion style transfer algorithms do not convert any prosody information. Two major components of prosody are pitch and rhythm. Disentangling the prosody information, particularly the rhythm component, from the speech is challenging because it involves breaking the synchrony between the input speech and the disentangled speech representation. As a result, most existing prosody… 

