Music SketchNet: Controllable Music Generation via Factorized Representations of Pitch and Rhythm

  title={Music SketchNet: Controllable Music Generation via Factorized Representations of Pitch and Rhythm},
  author={K. Chen and Cheng-i Wang and Taylor Berg-Kirkpatrick and Shlomo Dubnov},
Author(s): Chen, Ke; Wang, Cheng-i; Berg-Kirkpatrick, Taylor; Dubnov, Shlomo | Abstract: Drawing an analogy with automatic image completion systems, we propose Music SketchNet, a neural network framework that allows users to specify partial musical ideas guiding automatic music generation. We focus on generating the missing measures in incomplete monophonic musical pieces, conditioned on surrounding context, and optionally guided by user-specified pitch and rhythm snippets. First, we introduce… 

Figures and Tables from this paper

A Comprehensive Survey on Deep Music Generation: Multi-level Representations, Algorithms, Evaluations, and Future Directions

This paper attempts to provide an overview of various composition tasks under different music generation levels, covering most of the currently popular music generation tasks using deep learning.

Deep domain adaptation for polyphonic melody extraction

Experimental results show that meta-learning-based adaptation performs better than simple Tuning, and that this method outperforms the existing state-of-the-art non-adaptive polyphonic melody extraction algorithms.

Domain Adversarial Training on Conditional Variational Auto-Encoder for Controllable Music Generation

Demos and experiments show that the proposed condition corruption objective facilitates not only condition-invariant representation learning but also higher-quality controllability compared to baselines.

Improving Choral Music Separation through Expressive Synthesized Data from Sampled Instruments

An automated pipeline for synthesizing choral music data from sampled instrument plugins within controllable options for instrument expressiveness is provided and experiments demonstrate that the synthesized choral data is of sufficient quality to improve the model’s performance on real choralMusic separation datasets.

Controllable Data Generation by Deep Learning: A Review

A systematic review that explains this promising research area, commonly known as controllable deep data generation, and formally defines and proposes a taxonomy on various techniques and summarizes the evaluation metrics in this specific domain.

MIDISpace: Finding Linear Directions in Latent Space for Music Generation

This work proposes a method for discovering linear directions in the latent space of a musicgenerating Variational Auto-Encoder (VAE), and uses PCA, a statistical method, to transform the input data such that the variation along the new axes is maximized.

Symbolic music generation conditioned on continuous-valued emotions

The proposed approaches outperform conditioning using control tokens which is representative of the current state of the art and provide a new large-scale dataset of symbolic music paired with emotion labels in terms of valence and arousal.

Chord-Conditioned Melody Choralization with Controllable Harmonicity and Polyphonicity

DeepChoir is proposed, a melody choralization system, which can generate a four-part chorale for a given melody conditioned on a chord progression, and with the improved density sampling, a user can control the extent of harmonicity and polyphonicity for the chorales generated by DeepChoir.

HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection

HTS-AT is introduced: an audio transformer with a hierarchical structure to reduce the model size and training time, and is further combined with a token-semantic module to map final outputs into class featuremaps, thus enabling the model for the audio event detection and localization in time.

Tonet: Tone-Octave Network for Singing Melody Extraction from Polyphonic Music

TONet1, a plug-and-play model that improves both tone and octave perceptions by leveraging a novel input representation and a novel network architecture, is proposed and results show that tone-octave fusion with Tone-CFP can significantly improve the singing voice extraction performance across various datasets.



Deep Music Analogy Via Latent Representation Disentanglement

An explicitly-constrained variational autoencoder (EC$^2$-VAE) is contributed as a unified solution to all three sub-problems of disentangling music representations and is validated using objective measurements and evaluated by a subjective study.

MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment

Three models for symbolic multi-track music generation under the framework of generative adversarial networks (GANs), which differ in the underlying assumptions and accordingly the network architectures are referred to as the jamming model, the composer model and the hybrid model are proposed.

Learning to Traverse Latent Spaces for Musical Score Inpainting

A novel deep learning-based approach for musical score inpainting which takes both past and future musical context into account and is capable of suggesting ways to connect them in a musically meaningful manner, demonstrates the merit of learning complex trajectories in the latent spaces of deep generative models.

Music Transformer: Generating Music with Long-Term Structure

It is demonstrated that a Transformer with the modified relative attention mechanism can generate minutelong compositions with compelling structure, generate continuations that coherently elaborate on a given motif, and in a seq2seq setup generate accompaniments conditioned on melodies.

Explicitly Conditioned Melody Generation: A Case Study with Interdependent RNNs

The results indicate musically relevant conditioning significantly improves learning and performance, and reveal how this information affects learning of musical features related to pitch and rhythm.

Anticipation-RNN: enforcing unary constraints in sequence generation, with application to interactive music generation

This article introduces a novel architecture called anticipation-RNN which possesses the assets of the RNN-based generative models while allowing to enforce user-defined unary constraints and demonstrates its efficiency on the task of generating melodies satisfying unary constraint in the style of the soprano parts of the J.S. Bach chorale harmonizations.

DeepBach: a Steerable Model for Bach Chorales Generation

DeepBach, a graphical model aimed at modeling polyphonic music and specifically hymn-like pieces, is introduced, which is capable of generating highly convincing chorales in the style of Bach.

A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music

This work proposes the use of a hierarchical decoder, which first outputsembeddings for subsequences of the input and then uses these embeddings to generate each subsequence independently, thereby avoiding the "posterior collapse" problem, which remains an issue for recurrent VAEs.

Music transcription modelling and composition using deep learning

This work builds and train LSTM networks using approximately 23,000 music transcriptions expressed with a high-level vocabulary (ABC notation), and uses them to generate new transcriptions to create music transcription models useful in particular contexts of music composition.

Scribbler: Controlling Deep Image Synthesis with Sketch and Color

A deep adversarial image synthesis architecture that is conditioned on sketched boundaries and sparse color strokes to generate realistic cars, bedrooms, or faces is proposed and demonstrates a sketch based image synthesis system which allows users to scribble over the sketch to indicate preferred color for objects.