Conditioning Autoencoder Latent Spaces for Real-Time Timbre Interpolation and Synthesis

  title={Conditioning Autoencoder Latent Spaces for Real-Time Timbre Interpolation and Synthesis},
  author={Joseph T Colonel and Sam Keene},
  journal={2020 International Joint Conference on Neural Networks (IJCNN)},
  • Joseph T Colonel, S. Keene
  • Published 30 January 2020
  • Computer Science, Engineering
  • 2020 International Joint Conference on Neural Networks (IJCNN)
We compare standard autoencoder topologies’ performances for timbre generation. We demonstrate how different activation functions used in the autoencoder’s bottleneck distributes a training corpus’s embedding. We show that the choice of sigmoid activation in the bottleneck produces a more bounded and uniformly distributed embedding than a leaky rectified linear unit activation. We propose a one-hot encoded chroma feature vector for use in both input augmentation and latent space conditioning… Expand
1 Citations
Caesynth: Real-Time Timbre Interpolation and Pitch Control with Conditional Autoencoders
  • Aaron Valero Puche, Sukhan Lee
  • Computer Science, Engineering
  • 2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)
  • 2021
It is demonstrated by experiments that CAESynth achieves smooth and high-fidelity audio synthesis in real-time through timbre interpolation and independent yet accurate pitch control for musical cues as well as for audio affordance with environmental sound. Expand


Improving Neural Net Auto Encoders for Music Synthesis
A novel architecture for a synthesizer based on an autoencoder that compresses and reconstructs magnitude short time Fourier transform frames and can be quickly re-trained on any sound domain, making it flexible for music synthesis applications. Expand
GANSynth: Adversarial Neural Audio Synthesis
Through extensive empirical investigations on the NSynth dataset, it is demonstrated that GANs are able to outperform strong WaveNet baselines on automated and human evaluation metrics, and efficiently generate audio several orders of magnitude faster than their autoregressive counterparts. Expand
Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders
A powerful new WaveNet-style autoencoder model is detailed that conditions an autoregressive decoder on temporal codes learned from the raw audio waveform, and NSynth, a large-scale and high-quality dataset of musical notes that is an order of magnitude larger than comparable public datasets is introduced. Expand
Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer
This paper proposes a regularization procedure which encourages interpolated outputs to appear more realistic by fooling a critic network which has been trained to recover the mixing coefficient from interpolated data. Expand
Universal audio synthesizer control with normalizing flows
A novel formulation of audio synthesizer control is introduced that can address simultaneously automatic parameter inference, macro-control learning and audio-based preset exploration within a single model and is able to learn semantic controls of a synthesizer by smoothly mapping to its parameters. Expand
Assisted Sound Sample Generation with Musical Conditioning in Adversarial Auto-Encoders
The proposed model generates notes as magnitude spectrograms from any probabilistic latent code samples, with expressive control of orchestral timbres and playing styles, and can be applied to other sound domains, including an user's libraries with custom sound tags that could be mapped to specific generative controls. Expand
Sampling Generative Networks
Several techniques for sampling and visualizing the latent spaces of generative models are introduced and two new techniques for deriving attribute vectors are demonstrated: bias-corrected vectors with data replication and synthetic vectors withData augmentation. Expand
beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework
Learning an interpretable factorised representation of the independent data generative factors of the world without supervision is an important precursor for the development of artificialExpand
MidiMe: Personalizing a MusicVAE model with user data
Training a custom deep neural network model like Music Transformer, MusicVAE or SketchRNN from scratch requires significant amounts of data and compute resources as well as expertise in hyperparameter tuning. Expand
DDSP: Differentiable Digital Signal Processing
The Differentiable Digital Signal Processing library is introduced, which enables direct integration of classic signal processing elements with deep learning methods and achieves high-fidelity generation without the need for large autoregressive models or adversarial losses. Expand