• Corpus ID: 238856669

SpecSinGAN: Sound Effect Variation Synthesis Using Single-Image GANs

@article{BarahonaRios2021SpecSinGANSE,
  title={SpecSinGAN: Sound Effect Variation Synthesis Using Single-Image GANs},
  author={Adri'an Barahona-R'ios and Tom Collins},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.07311}
}
Single-image generative adversarial networks learn from the internal distribution of a single training example to generate variations of it, removing the need of a large dataset. In this paper we introduce SpecSinGAN, an unconditional generative architecture that takes a single one-shot sound effect (e.g., a footstep; a character jump) and produces novel variations of it, as if they were different takes from the same recording session. We explore the use of multi-channel spectrograms to train… 

Figures from this paper

Neural Synthesis of Sound Effects Using Flow-Based Deep Generative Models

This work presents a method to generate controllable variations of sound effects that can be used in the creative process of sound designers and adopts WaveFlow, a generative flow model that works directly on raw audio and has proven to perform well for speech synthesis.

References

SHOWING 1-10 OF 36 REFERENCES

One-to-Many Conversion for Percussive Samples

A filtering algorithm for generating subtle random variations in sampled sounds is proposed. Using only one recording for impact sound effects or drum machine sounds results in unrealistic

Catch-A-Waveform: Learning to Generate Audio from a Single Short Example

It is illustrated that capturing the essence of an audio source is typically possible from as little as a few tens of seconds from a single training signal, using a GAN-based generative model that can be trained on one short audio signal from any domain and does not require pre-training or any other form of external supervision.

Freesound technical demo

This demo wants to introduce Freesound to the multimedia community and show its potential as a research resource.

Improving Synthesizer Programming From Variational Autoencoders Latent Space

Generative models, which can infer parameters as well as generate new sets of parameters or perform smooth morphing effects between sounds, are introduced to ensure scalability and to increase performance by using heterogeneous representations of parameters as numerical and categorical random variables.

Hierarchical Patch VAE-GAN: Generating Diverse Videos from a Single Sample

This work introduces a novel patch-based variational autoencoder (VAE) which allows for a much greater diversity in generation of videos, and produces diverse samples in both the image domain, and the more challenging video domain.

DDSP: Differentiable Digital Signal Processing

The Differentiable Digital Signal Processing library is introduced, which enables direct integration of classic signal processing elements with deep learning methods and achieves high-fidelity generation without the need for large autoregressive models or adversarial losses.

Improved Training of Wasserstein GANs

This work proposes an alternative to clipping weights: penalize the norm of gradient of the critic with respect to its input, which performs better than standard WGAN and enables stable training of a wide variety of GAN architectures with almost no hyperparameter tuning.

Signal estimation from modified short-time Fourier transform

An algorithm to estimate a signal from its modified short-time Fourier transform (STFT) by minimizing the mean squared error between the STFT of the estimated signal and the modified STFT magnitude is presented.

Drop the GAN: In Defense of Patches Nearest Neighbors as Single Image Generative Models

This paper revisits the classical patch-based methods, and shows that - unlike previously believed - classical methods can be adapted to tackle these novel “GAN-only” tasks better and faster than single-image GAN- based methods.

Signal Representations for Synthesizing Audio Textures with Generative Adversarial Networks

This paper proposes that training GANs on single-channel magnitude spectra, and using the Phase Gradient Heap Integration (PGHI) inversion algorithm is a better comprehensive approach for audio synthesis modeling of diverse signals that include pitched, non-pitched, and dynamically complex sounds.