Corpus ID: 67856213

GANSynth: Adversarial Neural Audio Synthesis

@article{Engel2019GANSynthAN,
  title={GANSynth: Adversarial Neural Audio Synthesis},
  author={Jesse Engel and Kumar Krishna Agrawal and Shuo Chen and Ishaan Gulrajani and Chris Donahue and Adam Roberts},
  journal={ArXiv},
  year={2019},
  volume={abs/1902.08710}
}
Efficient audio synthesis is an inherently difficult machine learning task, as human perception is sensitive to both global structure and fine-scale waveform coherence. [...] Key Result Through extensive empirical investigations on the NSynth dataset, we demonstrate that GANs are able to outperform strong WaveNet baselines on automated and human evaluation metrics, and efficiently generate audio several orders of magnitude faster than their autoregressive counterparts.Expand
Adversarial Audio Synthesis
TLDR
WaveGAN is a first attempt at applying GANs to unsupervised synthesis of raw-waveform audio, capable of synthesizing one second slices of audio waveforms with global coherence, suitable for sound effect generation. Expand
MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis
TLDR
The model is non-autoregressive, fully convolutional, with significantly fewer parameters than competing models and generalizes to unseen speakers for mel-spectrogram inversion, and suggests a set of guidelines to design general purpose discriminators and generators for conditional sequence synthesis tasks. Expand
Comparing Representations for Audio Synthesis Using Generative Adversarial Networks
TLDR
Different audio signal representations, including the raw audio waveform and a variety of time-frequency representations, are compared for the task of audio synthesis with Generative Adversarial Networks (GANs) to show that complex-valued as well as the magnitude and Instantaneous Frequency of the Short-Time Fourier Transform achieve the best results, and yield fast generation and inversion times. Expand
Adversarial Generation of Time-Frequency Features with application in audio synthesis
TLDR
The potential of deliberate generative TF modeling is demonstrated by training a generative adversarial network (GAN) on short-time Fourier features and it is shown that by applying guidelines, the TF-based network was able to outperform a state-of-the-art GAN generating waveforms directly, despite the similar architecture in the two networks. Expand
HIGH FIDELITY SPEECH SYNTHESIS
Generative adversarial networks have seen rapid development in recent years and have led to remarkable improvements in generative modelling of images. However, their application in the audio domainExpand
High Fidelity Speech Synthesis with Adversarial Networks
TLDR
GAN-TTS is capable of generating high-fidelity speech with naturalness comparable to the state-of-the-art models, and unlike autoregressive models, it is highly parallelisable thanks to an efficient feed-forward generator. Expand
Music Generation using Deep Generative Modelling
Efficient synthesis of musical sequences is a challenging task from a machine learning perspective, as human perception is aware of the global context to shorter sequences as well of audio waveformsExpand
DDSP: Differentiable Digital Signal Processing
TLDR
The Differentiable Digital Signal Processing library is introduced, which enables direct integration of classic signal processing elements with deep learning methods and achieves high-fidelity generation without the need for large autoregressive models or adversarial losses. Expand
Unconditional Audio Generation with Generative Adversarial Networks and Cycle Regularization
TLDR
Evaluation result shows that new model outperforms the prior one both objectively and subjectively, and is employed to unconditionally generate sequences of piano and violin music and finds the result promising. Expand
GAN-based Augmentation for Populating Speech Dataset with High Fidelity Synthesized Audio
  • M. Back, Seung Won Yoon, Kyu-Chul Lee
  • Computer Science
  • 2020 International Conference on Information and Communication Technology Convergence (ICTC)
  • 2020
TLDR
An audio augmentation method that generates synthetic audio using Generative Adversarial Networks (GANs) that first uses Harmonic Percussive Source Separation to extract spectral features and then improves the fidelity of the synthesized audio by applying progressively-growing GANs. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 38 REFERENCES
Adversarial Audio Synthesis
TLDR
WaveGAN is a first attempt at applying GANs to unsupervised synthesis of raw-waveform audio, capable of synthesizing one second slices of audio waveforms with global coherence, suitable for sound effect generation. Expand
Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders
TLDR
A powerful new WaveNet-style autoencoder model is detailed that conditions an autoregressive decoder on temporal codes learned from the raw audio waveform, and NSynth, a large-scale and high-quality dataset of musical notes that is an order of magnitude larger than comparable public datasets is introduced. Expand
Large Scale GAN Training for High Fidelity Natural Image Synthesis
TLDR
It is found that applying orthogonal regularization to the generator renders it amenable to a simple "truncation trick," allowing fine control over the trade-off between sample fidelity and variety by reducing the variance of the Generator's input. Expand
WaveNet: A Generative Model for Raw Audio
TLDR
WaveNet, a deep neural network for generating raw audio waveforms, is introduced; it is shown that it can be efficiently trained on data with tens of thousands of samples per second of audio, and can be employed as a discriminative model, returning promising results for phoneme recognition. Expand
SING: Symbol-to-Instrument Neural Generator
TLDR
This work presents a lightweight neural audio synthesizer trained end-to-end to generate notes from nearly 1000 instruments with a single decoder, thanks to a new loss function that minimizes the distances between the log spectrograms of the generated and target waveforms. Expand
Improved Training of Wasserstein GANs
TLDR
This work proposes an alternative to clipping weights: penalize the norm of gradient of the critic with respect to its input, which performs better than standard WGAN and enables stable training of a wide variety of GAN architectures with almost no hyperparameter tuning. Expand
SampleRNN: An Unconditional End-to-End Neural Audio Generation Model
TLDR
It is shown that the model, which profits from combining memory-less modules, namely autoregressive multilayer perceptrons, and stateful recurrent neural networks in a hierarchical structure is able to capture underlying sources of variations in the temporal sequences over very long time spans, on three datasets of different nature. Expand
Bridging Audio Analysis, Perception and Synthesis with Perceptually-regularized Variational Timbre Spaces
TLDR
It is shown that Variational Auto-Encoders (VAE) can bridge the lines of research and alleviate their weaknesses by regularizing the latent spaces to match perceptual distances collected from timbre studies by proposing three types of regularization and showing that these spaces can be used for efficient audio classification. Expand
A Style-Based Generator Architecture for Generative Adversarial Networks
TLDR
An alternative generator architecture for generative adversarial networks is proposed, borrowing from style transfer literature, that improves the state-of-the-art in terms of traditional distribution quality metrics, leads to demonstrably better interpolation properties, and also better disentangles the latent factors of variation. Expand
A Multi-Discriminator CycleGAN for Unsupervised Non-Parallel Speech Domain Adaptation
TLDR
A novel generative model based on cyclic-consistent generative adversarial network (CycleGAN) for unsupervised non-parallel speech domain adaptation that employs multiple independent discriminators on the power spectrogram, each in charge of different frequency bands. Expand
...
1
2
3
4
...