• Corpus ID: 54765530

Improving Neural Net Auto Encoders for Music Synthesis

@article{Colonel2017ImprovingNN,
  title={Improving Neural Net Auto Encoders for Music Synthesis},
  author={Joseph T Colonel and Christopher Curro and Sam Keene},
  journal={Journal of The Audio Engineering Society},
  year={2017}
}
We present a novel architecture for a synthesizer based on an autoencoder that compresses and reconstructs magnitude short time Fourier transform frames. This architecture outperforms previous topologies by using improved regularization, employing several activation functions, creating a focused training corpus, and implementing the Adam learning method. By multiplying gains to the hidden layer, users can alter the autoencoder’s output, which opens up a palette of sounds unavailable to additive… 

Figures and Tables from this paper

Autoencoding Neural Networks as Musical Audio Synthesizers
TLDR
A method for musical audio synthesis using autoencoding neural networks is proposed, which is light-weight when compared to current state-of-the-art audio-producing machine learning algorithms.
Musical Instrument Synthesis and Morphing in Multidimensional Latent Space Using Variational, Convolutional Recurrent Autoencoders
TLDR
The reconstruction performance of VCRAE is evaluated by proxy through an instrument classifier, and provides significantly better accuracy than two other baseline autoencoder methods.
Autoencoders for music sound synthesis: a comparison of linear, shallow, deep and variational models
TLDR
It is shown that PCA systematically outperforms shallow AE and that only a deep architecture (DAE) can lead to a lower reconstruction error, and that VAEs are still able to outperform PCA while providing a low-dimensional latent space with nice "usability" properties.
Autoencoders for music sound modeling: a comparison of linear, shallow, deep, recurrent and variational models
TLDR
It is shown that VAEs are still able to outperform PCA while providing a low-dimensional latent space with nice "usability" properties, and contrary to the recent literature on image processing, it can show that PCA systematically outperforms shallow AE.
Conditioning Autoencoder Latent Spaces for Real-Time Timbre Interpolation and Synthesis
TLDR
This work proposes a one-hot encoded chroma feature vector for use in both input augmentation and latent space conditioning and measures the performance of these networks, and characterize the latent embeddings that arise from the use of this chroma conditioning vector.
A Comprehensive Survey on Deep Music Generation: Multi-level Representations, Algorithms, Evaluations, and Future Directions
TLDR
This paper attempts to provide an overview of various composition tasks under different music generation levels, covering most of the currently popular music generation tasks using deep learning.
Applications of Deep Learning to Audio Generation
TLDR
Thorough investigations of various deep learning architectures are provided under the categories of discriminative and generative algorithms, including the up-to-date Generative Adversarial Networks (GANs) as an integrated model.
Remixing AIs: mind swaps, hybrainity, and splicing musical models
TLDR
This activity of AI code bending is dubbed here ‘hybrainity’, and alongside theoretical discussion of its origins, potential and ethics, examples of hacking particular machine learning models for new creative projects are provided, including applications in live performance and audiovisual generation.
Music Sequence Prediction with Mixture Hidden Markov Models
TLDR
A novel mixture hidden Markov model for music play sequence prediction that integrates with recent advances in deep learning, computer vision, and speech techniques, and has promising potential in both academia and industry is proposed.
...
1
2
...

References

SHOWING 1-10 OF 12 REFERENCES
Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders
TLDR
A powerful new WaveNet-style autoencoder model is detailed that conditions an autoregressive decoder on temporal codes learned from the raw audio waveform, and NSynth, a large-scale and high-quality dataset of musical notes that is an order of magnitude larger than comparable public datasets is introduced.
Musical Audio Synthesis Using Autoencoding Neural Nets
TLDR
An interactive musi- cal audio synthesis system that uses feedforward artificial neural networks for musical audio synthesis, rather than discriminative or regression tasks, and allows one to interact directly with the parameters of the model and generate musical audio in real time.
WaveNet: A Generative Model for Raw Audio
TLDR
WaveNet, a deep neural network for generating raw audio waveforms, is introduced; it is shown that it can be efficiently trained on data with tens of thousands of samples per second of audio, and can be employed as a discriminative model, returning promising results for phoneme recognition.
Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion
TLDR
This work clearly establishes the value of using a denoising criterion as a tractable unsupervised objective to guide the learning of useful higher level representations.
Adam: A Method for Stochastic Optimization
TLDR
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Dropout: a simple way to prevent neural networks from overfitting
TLDR
It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
TensorFlow: learning functions at scale
TLDR
This talk describes Tensor Flow and outlines some of its applications, and discusses the question of what TensorFlow and deep learning may have to do with functional programming.
On the importance of initialization and momentum in deep learning
TLDR
It is shown that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule for the momentum parameter, it can train both DNNs and RNNs to levels of performance that were previously achievable only with Hessian-Free optimization.
A Simple Weight Decay Can Improve Generalization
TLDR
It is proven that a weight decay has two effects in a linear network, and it is shown how to extend these results to networks with hidden layers and non-linear units.
librosa: Audio and Music Signal Analysis in Python
TLDR
A brief overview of the librosa library's functionality is provided, along with explanations of the design goals, software development practices, and notational conventions.
...
1
2
...