Chunked Autoregressive GAN for Conditional Waveform Synthesis

  author={Max Morrison and Rithesh Kumar and Kundan Kumar and Prem Seetharaman and Aaron C. Courville and Yoshua Bengio},
Conditional waveform synthesis models learn a distribution of audio waveforms given conditioning such as text, mel-spectrograms, or MIDI. These systems employ deep generative models that model the waveform via either sequential (autoregressive) or parallel (non-autoregressive) sampling. Generative adversarial networks (GANs) have become a common choice for non-autoregressive waveform synthesis. However, state-of-the-art GAN-based models produce artifacts when performing mel-spectrogram… 

