Deep generative models for musical audio synthesis

@article{Huzaifah2021DeepGM,
  title={Deep generative models for musical audio synthesis},
  author={Muhammad Huzaifah and Lonce L. Wyse},
  journal={ArXiv},
  year={2021},
  volume={abs/2006.06426}
}
Sound modelling is the process of developing algorithms that generate sound under parametric control. There are a few distinct approaches that have been developed historically including modelling the physics of sound production and propagation, assembling signal generating and processing elements to capture acoustic features, and manipulating collections of recorded audio samples. While each of these approaches has been able to achieve high-quality synthesis and interaction for specific… 
Disembodied Timbres: A Study on Semantically Prompted FM Synthesis
Disembodied electronic sounds constitute a large part of the modern auditory lexicon, but research into timbre perception has focused mostly on the tones of conventional acoustic musical instruments.
Research on Chord-Constrained Two-Track Music Generation Based on Improved GAN Networks
TLDR
A GRU network is used in chord feature extraction in order to autonomously learn chords at 1 : t − 1 moments and generate chords at t moments, by saving the hidden layer state of each batch and constructing a layer of GRU combined with a generator, thus achieving the effect of automatically learning the overall style of chords.
Few Data Diversification in Training Generative Adversarial Networks
TLDR
The use of new GAN models capable of generating sharp images in high resolution and with a high level of variation with real-world image sets is discussed here, since they are composed of limited sample size sets.
Audio representations for deep learning in sound synthesis: A review
  • A. NatsiouSean O'Leary
  • Computer Science
    2021 IEEE/ACS 18th International Conference on Computer Systems and Applications (AICCSA)
  • 2021
TLDR
An overview of audio representations applied to sound synthesis using deep learning and the most significant methods for developing and evaluating a sound synthesis architecture using deeplearning models, always depending on the audio representation are presented.
A Concept of a Wavetable Oscillator Based on a Neural Autoencoder
TLDR
The results suggest that even small and efficient generative models can successfully perform the task of generating diverse and novel singlecycle waveforms based on a small number of input parameters with sufficient computational efficiency.
Energy Consumption of Deep Generative Audio Models
TLDR
A multi-objective measure based on Pareto optimality, which takes into account both the quality of the model and its energy consumption is suggested, which can be widely used by the community to evaluate their work, putting computational cost in the spotlight of deep learning research.
A Probabilistic Approach to Situated Acoustic Road Event Detection
TLDR
The experimental results show that the framework is able to detect acoustic road events such as a crash and tire slipping using the presented probabilistic framework.
Deep learning models for generating audio textures
TLDR
A new and growing data set along with a system for managing metadata specifically designed for audio textures is introduced along with some recent advances in texture models that are capable of generating sounds substantially beyond the range of sounds on which they are trained.

References

SHOWING 1-10 OF 98 REFERENCES
Conditioning Deep Generative Raw Audio Models for Structured Automatic Music
TLDR
This paper considers a Long Short Term Memory network to learn the melodic structure of different styles of music, and then uses the unique symbolic generations from this model as a conditioning input to a WaveNet-based raw audio generator, creating a model for automatic, novel music.
Neural Music Synthesis for Flexible Timbre Control
TLDR
A neural music synthesis model with flexible timbre controls, which consists of a recurrent neural network conditioned on a learned instrument embedding followed by a WaveNet vocoder, is described.
SING: Symbol-to-Instrument Neural Generator
TLDR
This work presents a lightweight neural audio synthesizer trained end-to-end to generate notes from nearly 1000 instruments with a single decoder, thanks to a new loss function that minimizes the distances between the log spectrograms of the generated and target waveforms.
Real-valued parametric conditioning of an RNN for interactive sound synthesis
TLDR
The focus of this paper is on conditioning data-driven synthesis models with real-valued parameters on the ability of the system to generalize and to be responsive to parameter values and sequences not seen during training.
Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders
TLDR
A powerful new WaveNet-style autoencoder model is detailed that conditions an autoregressive decoder on temporal codes learned from the raw audio waveform, and NSynth, a large-scale and high-quality dataset of musical notes that is an order of magnitude larger than comparable public datasets is introduced.
The challenge of realistic music generation: modelling raw audio at scale
TLDR
Autoregressive discrete autoencoders (ADAs) are explored as a means to enable autoregressive models to capture long-range correlations in waveforms and are found to unconditionally generate piano music directly in the raw audio domain, which shows stylistic consistency across tens of seconds.
WaveNet: A Generative Model for Raw Audio
TLDR
WaveNet, a deep neural network for generating raw audio waveforms, is introduced; it is shown that it can be efficiently trained on data with tens of thousands of samples per second of audio, and can be employed as a discriminative model, returning promising results for phoneme recognition.
Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset
TLDR
By using notes as an intermediate representation, a suite of models capable of transcribing, composing, and synthesizing audio waveforms with coherent musical structure on timescales spanning six orders of magnitude are trained, a process the authors call Wave2Midi2Wave.
Conditioning a Recurrent Neural Network to synthesize musical instrument transients
TLDR
This work finds that the network learns the particular transient characteristics of two different synthetic instruments, and furthermore shows some ability to interpolate between the characteristics of the instruments used in training in response to novel parameter settings.
Deep Learning Techniques for Music Generation - A Survey
TLDR
This paper is a survey and an analysis of different ways of using deep learning (deep artificial neural networks) to generate musical content, based on the analysis of many existing deep-learning based systems for music generation selected from the relevant literature.
...
...