• Corpus ID: 175089

Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription

@article{BoulangerLewandowski2012ModelingTD,
  title={Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription},
  author={Nicolas Boulanger-Lewandowski and Yoshua Bengio and Pascal Vincent},
  journal={arXiv: Learning},
  year={2012}
}
We investigate the problem of modeling symbolic sequences of polyphonic music in a completely general piano-roll representation. We introduce a probabilistic model based on distribution estimators conditioned on a recurrent neural network that is able to discover temporal dependencies in high-dimensional sequences. Our approach outperforms many traditional models of polyphonic music on a variety of realistic datasets. We show how our musical language model can serve as a symbolic prior to… 

Figures and Tables from this paper

High-dimensional sequence transduction
TLDR
A probabilistic model based on a recurrent neural network that is able to learn realistic output distributions given the input is introduced and an efficient algorithm to search for the global mode of that distribution is devised.
A hybrid recurrent neural network for music transcription
TLDR
This work uses recurrent neural networks and their variants as music language models and presents a generative architecture for combining these models with predictions from a frame level acoustic classifier and compares different neural network architectures for acoustic modeling.
A Dual Classification Approach to Music Language Modelingl
TLDR
An original architecture is introduced that poses the problem of modeling symbolic sequences of polyphonic music in a completely general piano-roll representation as a dual-classification task rather than one with a multimodal probability distribution.
Modelling Symbolic Music: Beyond the Piano Roll
TLDR
A representation which reduces polyphonic music to a univariate categorical sequence is introduced, which is able to apply state of the art natural language processing techniques, namely the long short-term memory sequence model.
Sequence Generation using Deep Recurrent Networks and Embeddings: A study case in music
TLDR
The proposed approach considers music theory concepts such as transposition, and uses data transformations to introduce semantic meaning and improve the quality of the generated melodies, measuring the tonality of the musical compositions.
Rethinking Recurrent Latent Variable Model for Music Composition
TLDR
This work presents a model for capturing musical features and creating novel sequences of music, called the Convolutional-Variational Recurrent Neural Network, which uses an encoder-decoder architecture with latent probabilistic connections to capture the hidden structure of music.
Generating Polyphonic Music Using Tied Parallel Networks
TLDR
A neural network architecture which enables prediction and composition of polyphonic music in a manner that preserves translation-invariance of the dataset and attains high performance at a musical prediction task and successfully creates note sequences which possess measure-level musical structure.
An RNN-based Music Language Model for Improving Automatic Music Transcription
TLDR
The acoustic AMT model is based on probabilistic latent component analysis, and prior information from the MLM is incorporated into the transcription framework using Dirichlet priors, causing a significant 3% improvement in terms of F-measure, when compared to using an acoustic-only model.
Modeling Temporal Tonal Relations in Polyphonic Music Through Deep Networks With a Novel Image-Based Representation
TLDR
Experimental results show that the tonnetz representation produces musical sequences that are more tonally stable and contain more repeated patterns than sequences generated by pianoroll-based models, a finding that is directly useful for tackling current challenges in music and AI such as smart music generation.
Improving Polyphonic Music Models with Feature-rich Encoding
TLDR
TonicNet is introduced, a GRU-based model trained to initially predict the chord at a given time-step before then predicting the notes of each voice at that time- step, in contrast with the typical approach of predicting only the notes.
...
...

References

SHOWING 1-10 OF 29 REFERENCES
Finding temporal structure in music: blues improvisation with LSTM recurrent networks
  • D. Eck, J. Schmidhuber
  • Computer Science
    Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing
  • 2002
TLDR
Long short-term memory (LSTM) has succeeded in similar domains where other RNNs have failed, such as timing and counting and the learning of context sensitive languages, and it is shown that LSTM is also a good mechanism for learning to compose music.
Probabilistic models for melodic prediction
A Discriminative Model for Polyphonic Piano Transcription
TLDR
A discriminative model for polyphonic piano transcription is presented and a frame-level transcription accuracy of 68% was achieved on a newly generated test set, and direct comparisons to previous approaches are provided.
Neural Network Music Composition by Prediction: Exploring the Benefits of Psychoacoustic Constraints and Multi-scale Processing
  • M. Mozer
  • Computer Science
    Connect. Sci.
  • 1994
TLDR
An extension of this transition-table approach is described, using a recurrent autopredictive connectionist network called CONCERT, which is trained on a set of pieces with the aim of extracting stylistic regularities and incorporation of psychologically grounded representations of pitch, duration and harmonic structure.
Bayesian Music Transcription
TLDR
The aim of this thesis is to integrate this vast amount of prior knowledge in a consistent and transparent computational framework and to demonstrate the feasibility of such an approach in moving us closer to a practical solution to music transcription.
Learning Multilevel Distributed Representations for High-Dimensional Sequences
TLDR
A new family of non-linear sequence models that are substantially more powerful than hidden Markov models or linear dynamical systems are described, and their performance is demonstrated using synthetic video sequences of two balls bouncing in a box.
Pitch Detection in Polyphonic Music using Instrument Tone Models
  • Yipeng Li, Deliang Wang
  • Computer Science
    2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07
  • 2007
TLDR
A hidden Markov model (HMM) based system to detect the pitch of an instrument in polyphonic music using an instrument tone model and a hypothesis selection method to choose pitch hypotheses with sufficiently high salience as pitch candidates is proposed.
A hierarchy of recurrent networks for speech recognition
TLDR
This approach unifies RBM-based approaches for sequential data modeling and the Echo State Network, a powerful approach for black-box system identification.
Learning long-term dependencies with gradient descent is difficult
TLDR
This work shows why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases, and exposes a trade-off between efficient learning by gradient descent and latching on information for long periods.
Evaluation of Multiple-F0 Estimation and Tracking Systems
TLDR
This paper presents the systematic evaluations of over a dozen competing methods and algorithms for extracting the fundamental frequencies of pitched sound sources in polyphonic music.
...
...