Corpus ID: 226956064

A Comprehensive Survey on Deep Music Generation: Multi-level Representations, Algorithms, Evaluations, and Future Directions

  title={A Comprehensive Survey on Deep Music Generation: Multi-level Representations, Algorithms, Evaluations, and Future Directions},
  author={Shulei Ji and Jing Luo and X. Yang},
The utilization of deep learning techniques in generating various contents (such as image, text, etc.) has become a trend. Especially music, the topic of this paper, has attracted widespread attention of countless researchers.The whole process of producing music can be divided into three stages, corresponding to the three levels of music generation: score generation produces scores, performance generation adds performance characteristics to the scores, and audio generation converts scores with… Expand
MuseMorphose: Full-Song and Fine-Grained Music Style Transfer with Just One Transformer VAE
Experiments show that MuseMorphose outperforms recurrent neural network (RNN) based baselines on numerous widely-used metrics for style transfer tasks. Expand
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
This paper develops MusicBERT, a large-scale pre-trained model for music understanding that contains more than 1 million music songs and designs several mechanisms, including OctupleMIDI encoding and barlevel masking strategy, to enhance pre-training with symbolic music data. Expand
MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding
An attempt to employ the mask language modeling approach of BERT to pre-train a 12-layer Transformer model for tackling a number of symbolic-domain discriminative music understanding tasks, finding that, given a pretrained Transformer, the models outperform recurrent neural network based baselines with less than 10 epochs of fine-tuning. Expand
Review of end-to-end speech synthesis technology based on deep learning
The opensource speech corpus of English, Chinese and other languages that can be used for speech synthesis tasks are summarized, and some commonly used subjective and objective speech quality evaluation method are introduced. Expand
Applications of Computational Intelligence in Computer Music Composition
This study shows that the most suitable techniques for human composers imitative systems are case-based reasoning and artificial neural networks, and it is shown that Markov models are more suitable for predicting musical notes based on the given previous notes. Expand
DeepEigen: Learning-based Modal Sound Synthesis with Acoustic Transfer Maps
We present a novel learning-based approach to compute the eigenmodes and acoustic transfer data for the sound synthesis of arbitrary solid objects. Our approach combines two network-based solutionsExpand
A template for the arxiv style
While deep learning models have greatly improved the performance of most artificial intelligence tasks, they are often criticized to be untrustworthy due to the black-box problem. Consequently, manyExpand
Data Hiding with Deep Learning: A Survey Unifying Digital Watermarking and Steganography
This survey summarises recent developments in deep learning techniques for data hiding for the purposes of watermarking and steganography, categorising them based on model architectures and noise injection methods. Expand
Music Composition with Deep Learning: A Review
Generating a complex work of art such as a musical composition requires exhibiting true creativity that depends on a variety of factors that are related to the hierarchy of musical language. MusicExpand


Music Generation with Deep Learning
This project deals with the generation of music using raw audio files in the frequency domain relying on various LSTM architectures and uses no information about musical structure(notes or chords) to aid learning. Expand
Music Generation by Deep Learning - Challenges and Directions
Some limitations of a direct application of deep learning to music generation are selected, why the issues are not fulfilled and how to address them by possible approaches are analyzed. Expand
Deep recurrent music writer: Memory-enhanced variational autoencoder-based musical score composition and an objective measure
This work introduces and evaluates a new metric for an objective assessment of the quality of the generated pieces and uses this measure to evaluate the outputs of a truly generative model based on Variational Autoencoders that is applied here to automated music composition. Expand
POP909: A Pop-song Dataset for Music Arrangement Generation
POP909, a dataset which contains multiple versions of the piano arrangements of 909 popular songs created by professional musicians, and provides the annotations of tempo, beat, key, and chords, where the tempo curves are hand-labeled and others are done by MIR algorithms. Expand
PerformanceNet: Score-to-Audio Music Generation with Multi-Band Convolutional Residual Network
A deep convolutional model is proposed that learns in an end-to-end manner the score- to-audio mapping between a symbolic representation of music called the pianorolls and an audio representation ofMusic called the spectrograms and achieves higher mean opinion score (MOS) in naturalness and emotional expressivity than a WaveNet-based model and two off-the-shelf synthesizers. Expand
Part-invariant Model for Music Generation and Harmonization
A neural language (music) model that tries to model symbolic multi-part music that can process/generate any part (voice) of a music score consisting of an arbitrary number of parts, using a single trained model. Expand
A GAN Model With Self-attention Mechanism To Generate Multi-instruments Symbolic Music
A new GAN model with self-attention mechanism, DMB-GAN, which can extract more temporal features of music to generate multi-instruments music stably and introduce switchable normalization to stabilize network training is proposed. Expand
The Effect of Explicit Structure Encoding of Deep Neural Networks for Symbolic Music Generation
This study attempts to solve the melody generation problem constrained by the given chord progression, and explores the effect of explicit architectural encoding of musical structure via comparing two sequential generative models: LSTM and WaveNet. Expand
Conditioning Deep Generative Raw Audio Models for Structured Automatic Music
This paper considers a Long Short Term Memory network to learn the melodic structure of different styles of music, and then uses the unique symbolic generations from this model as a conditioning input to a WaveNet-based raw audio generator, creating a model for automatic, novel music. Expand
Symbolic Music Genre Transfer with CycleGAN
This paper applies the first application of GANs to symbolic music domain transfer and adds additional discriminators that cause the generators to keep the structure of the original music mostly intact, while still achieving strong genre transfer. Expand