• Corpus ID: 232222920

Real-time Timbre Transfer and Sound Synthesis using DDSP

  title={Real-time Timbre Transfer and Sound Synthesis using DDSP},
  author={Francesco Ganis and Erik Frej Knudesn and Soren V. K. Lyster and Robin Otterbein and David Sudholt and Cumhur Erkut},
Neural audio synthesis is an actively researched topic, having yielded a wide range of techniques that leverages machine learning architectures. Google Magenta elaborated a novel approach called Differential Digital Signal Processing (DDSP) that incorporates deep neural networks with preconditioned digital signal processing techniques, reaching state-of-the-art results especially in timbre transfer applications. However, most of these techniques, including the DDSP, are generally not applicable… 

Figures and Tables from this paper

Differentiable Wavetable Synthesis
This work achieves high-fidelity audio synthesis with as little as 10 to 20 wavetables and demonstrates how a data-driven dictionary of waveforms opens up unprecedented one-shot learning paradigms on short audio clips.
SpecSinGAN: Sound Effect Variation Synthesis Using Single-Image GANs
This paper introduces SpecSinGAN, an unconditional generative architecture that takes a single one-shot sound effect and produces novel variations of it, as if they were different takes from the same recording session, using multi-channel spectrograms.


  • Encyclopedic Dictionary of Archaeology
  • 2021
DDSP: Differentiable Digital Signal Processing
The Differentiable Digital Signal Processing library is introduced, which enables direct integration of classic signal processing elements with deep learning methods and achieves high-fidelity generation without the need for large autoregressive models or adversarial losses.
Adversarial Audio Synthesis
WaveGAN is a first attempt at applying GANs to unsupervised synthesis of raw-waveform audio, capable of synthesizing one second slices of audio waveforms with global coherence, suitable for sound effect generation.
Crepe: A Convolutional Representation for Pitch Estimation
This paper proposes a data-driven pitch tracking algorithm, CREPE, which is based on a deep convolutional neural network that operates directly on the time-domain waveform, and evaluates the model's generalizability in terms of noise robustness.
A Neural Parametric Singing Synthesizer Modeling Timbre and Expression from Natural Songs
We recently presented a new model for singing synthesis based on a modified version of the WaveNet architecture. Instead of modeling raw waveform, we model features produced by a parametric vocoder
An Effective Requirement Engineering Process Model for Software Development and Requirements Management
This paper proposes an effective requirements engineering process model to produce quality requirements for software development and shows how this process can have a good impact on the production of quality software product.
Automatic annotation of musical audio for interactive applications
This work is interested in developing a robust layer for the automatic annotation of audio signals, to be used in various applications, from music search engines to interactive installations, and in various contexts, from embedded devices to audio content servers.
YIN, a fundamental frequency estimator for speech and music.
An algorithm is presented for the estimation of the fundamental frequency (F0) of speech or musical sounds. It is based on the well-known autocorrelation method with a number of modifications that
Discrete Simulation of Colored Noise and Stochastic Processes and llf" Power Law Noise Generation
The theory and mechanics of generating digital, pseudorandom sequences on a computer as simulations of known stochastic processes as well as approximate techniques for generating digital sequences with the "correct" discrete spectrum or correlations are discussed.
A sound analysis/synthesis system based on a deterministic plus stochastic decomposition
This paper addresses the second category of synthesis technique: spectrum modeling and describes a technique called specftal modeling synthesis {SMSl, that models time-varying spectra as a collection of sinusoids controlled through time by piecewise linear amplitude and frequency envelopes.