SampleRNN: An Unconditional End-to-End Neural Audio Generation Model
@article{Mehri2017SampleRNNAU, title={SampleRNN: An Unconditional End-to-End Neural Audio Generation Model}, author={Soroush Mehri and Kundan Kumar and Ishaan Gulrajani and Rithesh Kumar and Shubham Jain and Jose M. R. Sotelo and Aaron C. Courville and Yoshua Bengio}, journal={ArXiv}, year={2017}, volume={abs/1612.07837} }
In this paper we propose a novel model for unconditional audio generation task that generates one audio sample at a time. [] Key Result We also show how each component of the model contributes to the exhibited performance.
445 Citations
MelNet: A Generative Model for Audio in the Frequency Domain
- Computer ScienceArXiv
- 2019
This work designs a model capable of generating high-fidelity audio samples which capture structure at timescales that time-domain models have yet to achieve, and applies it to a variety of audio generation tasks, showing improvements over previous approaches in both density estimates and human judgments.
It's Raw! Audio Generation with State-Space Models
- Computer ScienceICML
- 2022
SaShiMi, a new multi-scale architecture for waveform modeling built around the recently introduced S4 model for long sequence modeling, is proposed, identifying that S4 can be unstable during autoregressive generation, and providing a simple improvement to its parameterization by drawing connections to Hurwitz matrices.
HybridNet: A Hybrid Neural Architecture to Speed-up Autoregressive Models
- Computer Science
- 2018
This paper introduces HybridNet, a hybrid neural network to speed-up autoregressive models for raw audio waveform generation and yields state-of-art performance when applied to text-to-speech.
GoodBye WaveNet - A Language Model for Raw Audio with Context of 1/2 Million Samples
- Computer ScienceArXiv
- 2022
This work proposes a generative auto-regressive architecture that can model audio waveforms over quite a large context, greater than 500,000 samples, on a standard dataset for modeling long-term structure.
Char2Wav: End-to-End Speech Synthesis
- Computer ScienceICLR
- 2017
Char2Wav is an end-to-end model for speech synthesis that learns to produce audio directly from text and is a bidirectional recurrent neural network with attention that produces vocoder acoustic features.
Conditional Sound Generation Using Neural Discrete Time-Frequency Representation Learning
- Computer Science2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)
- 2021
This work proposes a method for generating sounds via neural discrete time-frequency representation learning, conditioned on sound classes, which offers an advantage in efficiently modelling long-range dependencies and retaining local fine-grained structures within sound clips.
Multi-speaker Neural Vocoder
- Computer Science
- 2018
This dissertation explores the possibilities of implementing an adaptation of the end-toend model SampleRNN conditioned to both speech parameters and speaker identity that allow an entire shared framework to be implemented in a speech synthesis system.
A general-purpose deep learning approach to model time-varying audio effects
- Computer ScienceArXiv
- 2019
This work proposes a deep learning architecture for generic black-box modeling of audio processors with long-term memory based on convolutional and recurrent neural networks and proposes an objective metric based on the psychoacoustics of modulation frequency perception.
SING: Symbol-to-Instrument Neural Generator
- Computer ScienceNeurIPS
- 2018
This work presents a lightweight neural audio synthesizer trained end-to-end to generate notes from nearly 1000 instruments with a single decoder, thanks to a new loss function that minimizes the distances between the log spectrograms of the generated and target waveforms.
Tacotron: Towards End-to-End Speech Synthesis
- Computer ScienceINTERSPEECH
- 2017
Tacotron is presented, an end-to-end generative text- to-speech model that synthesizes speech directly from characters that achieves a 3.82 subjective 5-scale mean opinion score on US English, outperforming a production parametric system in terms of naturalness.
References
SHOWING 1-10 OF 32 REFERENCES
WaveNet: A Generative Model for Raw Audio
- Computer ScienceSSW
- 2016
WaveNet, a deep neural network for generating raw audio waveforms, is introduced; it is shown that it can be efficiently trained on data with tens of thousands of samples per second of audio, and can be employed as a discriminative model, returning promising results for phoneme recognition.
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
- Computer ScienceArXiv
- 2014
These advanced recurrent units that implement a gating mechanism, such as a long short-term memory (LSTM) unit and a recently proposed gated recurrent unit (GRU), are found to be comparable to LSTM.
A Recurrent Latent Variable Model for Sequential Data
- Computer ScienceNIPS
- 2015
It is argued that through the use of high-level latent random variables, the variational RNN (VRNN)1 can model the kind of variability observed in highly structured sequential data such as natural speech.
A Clockwork RNN
- Computer ScienceICML
- 2014
This paper introduces a simple, yet powerful modification to the simple RNN architecture, the Clockwork RNN (CW-RNN), in which the hidden layer is partitioned into separate modules, each processing inputs at its own temporal granularity, making computations only at its prescribed clock rate.
An Empirical Exploration of Recurrent Network Architectures
- Computer ScienceICML
- 2015
It is found that adding a bias of 1 to the LSTM's forget gate closes the gap between the L STM and the recently-introduced Gated Recurrent Unit (GRU) on some but not all tasks.
Generating Sequences With Recurrent Neural Networks
- Computer ScienceArXiv
- 2013
This paper shows how Long Short-term Memory recurrent neural networks can be used to generate complex sequences with long-range structure, simply by predicting one data point at a time. The approach…
Unsupervised feature learning for audio classification using convolutional deep belief networks
- Computer ScienceNIPS
- 2009
In recent years, deep learning approaches have gained significant interest as a way of building hierarchical representations from unlabeled data. However, to our knowledge, these deep learning…
Learning Complex, Extended Sequences Using the Principle of History Compression
- Computer ScienceNeural Computation
- 1992
A simple principle for reducing the descriptions of event sequences without loss of information is introduced and this insight leads to the construction of neural architectures that learn to divide and conquer by recursively decomposing sequences.
Pixel Recurrent Neural Networks
- Computer ScienceICML
- 2016
A deep neural network is presented that sequentially predicts the pixels in an image along the two spatial dimensions and encodes the complete set of dependencies in the image to achieve log-likelihood scores on natural images that are considerably better than the previous state of the art.
Long Short-Term Memory
- Computer ScienceNeural Computation
- 1997
A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.