Corpus ID: 3524525

Efficient Neural Audio Synthesis

  title={Efficient Neural Audio Synthesis},
  author={Nal Kalchbrenner and E. Elsen and K. Simonyan and Seb Noury and Norman Casagrande and Edward Lockhart and Florian Stimberg and A. Oord and S. Dieleman and K. Kavukcuoglu},
  • Nal Kalchbrenner, E. Elsen, +7 authors K. Kavukcuoglu
  • Published 2018
  • Computer Science, Engineering
  • ArXiv
  • Sequential models achieve state-of-the-art results in audio, visual and textual domains with respect to both estimating the data distribution and generating high-quality samples. [...] Key Method We first describe a single-layer recurrent neural network, the WaveRNN, with a dual softmax layer that matches the quality of the state-of-the-art WaveNet model. The compact form of the network makes it possible to generate 24kHz 16-bit audio 4x faster than real time on a GPU.Expand Abstract
    SpeedySpeech: Efficient Neural Speech Synthesis
    SING: Symbol-to-Instrument Neural Generator
    • 15
    • PDF
    SFNet: A Computationally Efficient Source Filter Model Based Neural Speech Synthesis
    WaveFlow: A Compact Flow-based Model for Raw Audio
    • 9
    • Highly Influenced
    • PDF
    A Spectral Energy Distance for Parallel Speech Synthesis
    • 2
    • PDF
    WG-WaveNet: Real-Time High-Fidelity Speech Synthesis without GPU
    Quasi-fully Convolutional Neural Network with Variational Inference for Speech Synthesis
    • 2
    • PDF
    MelNet: A Generative Model for Audio in the Frequency Domain
    • 39
    • PDF
    Neural speech synthesis for resource-scarce languages


    Publications referenced by this paper.
    Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders
    • 222
    • PDF
    WaveNet: A Generative Model for Raw Audio
    • 2,809
    • PDF
    Parallel WaveNet: Fast High-Fidelity Speech Synthesis
    • 379
    • PDF
    Deep Voice: Real-time Neural Text-to-Speech
    • 285
    • PDF
    Block-Sparse Recurrent Neural Networks
    • 47
    • PDF
    Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model
    • 133
    • PDF
    SampleRNN: An Unconditional End-to-End Neural Audio Generation Model
    • 300
    • PDF
    Neural Machine Translation in Linear Time
    • 347
    • PDF
    Exploring Sparsity in Recurrent Neural Networks
    • 135
    • PDF
    Attention is All you Need
    • 11,984
    • PDF