Multi-modal Conditional Bounding Box Regression for Music Score Following

@article{Henkel2021MultimodalCB,
  title={Multi-modal Conditional Bounding Box Regression for Music Score Following},
  author={Florian Henkel and Gerhard Widmer},
  journal={2021 29th European Signal Processing Conference (EUSIPCO)},
  year={2021},
  pages={356-360}
}
  • Florian HenkelG. Widmer
  • Published 10 May 2021
  • Computer Science
  • 2021 29th European Signal Processing Conference (EUSIPCO)
This paper addresses the problem of sheet-image-based on-line audio-to-score alignment also known as score following. Drawing inspiration from object detection, a conditional neural network architecture is proposed that directly predicts x,y coordinates of the matching positions in a complete score sheet image at each point in time for a given musical performance. Experiments are conducted on a synthetic polyphonic piano benchmark dataset and the new method is compared to several existing… 

Figures and Tables from this paper

Real-Time Music Following in Score Sheet Images via Multi-Resolution Prediction

This work proposes a method that does not solely rely on note alignments but is additionally capable of leveraging data with annotations of lower granularity, such as bar or score system alignments, which allows us to use a large collection of real-world piano performance recordings coarsely aligned to scanned score sheet images and improve over current state-of-the-art approaches.

Real-Time Music Following in Score Sheet Images via Multi-Resolution Prediction

This work proposes a method that does not solely rely on note alignments but is additionally capable of leveraging data with annotations of lower granularity, such as bar or score system alignments, which allows us to use a large collection of real-world piano performance recordings coarsely aligned to scanned score sheet images and improve over current state-of-the-art approaches.

Music Score Recognition and Composition Application Based on Deep Learning

  • Mingheng Liang
  • Computer Science
    Mathematical Problems in Engineering
  • 2022
A deep learning-based music score recognition model is proposed that employs a deep network, accepts the entire score image as input, and outputs the note's time value and pitch directly.

Fully Automatic Page Turning on Real Scores

A prototype of an automatic page turning system that works directly on real scores, i.e., sheet images, without any symbolic representation is presented, based on a multi-modal neural network architecture.

Audio-Conditioned U-Net for Position Estimation in Full Sheet Images

This work proposes an architecture capable of estimating matching score positions directly within entire unprocessed sheet images and argues that this is a necessary first step towards a fully integrated score following system that does not rely on any preprocessing steps such as optical music recognition.

Towards Score Following In Sheet Music Images

The results suggest that with the use of (deep) neural networks -- which have proven to be powerful image processing models -- working with sheet music becomes feasible and a promising future research direction.

Learning Audio-Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification

A method that learns joint embedding spaces for short excerpts of audio and their respective counterparts in sheet music images, using multimodal convolutional neural networks to solve the problem of matching musical audio directly to sheet music, without any higher-level abstract representation is proposed.

Learning to Read and Follow Music in Complete Score Sheet Images

This paper proposes the first system that directly performs score following in full-page, completely unprocessed sheet images, based on incoming audio and a given image of the score, which outperforms current state-of-the-art image-based score followers in terms of alignment precision.

Score Following as a Multi-Modal Reinforcement Learning Problem

This paper designs end-to-end multi-modal RL agents that simultaneously learn to listen to music recordings, read the scores from images of sheet music, and follow the music along in the sheet.

Learning Frame Similarity using Siamese networks for Audio-to-Score Alignment

Experiments on music data from different acoustic conditions demonstrate that the proposed method achieves higher alignment accuracy than a standard DTW-based method that uses handcrafted features, and generates robust alignments whilst being adaptable to different domains at the same time.

Learning to Listen, Read, and Follow: Score Following as a Reinforcement Learning Game

This paper formulates score following as a multimodal Markov Decision Process, the mathematical foundation for sequential decision making, and addresses the score following task with state-of-the-art deep reinforcement learning (RL) algorithms such as synchronous advantage actor critic (A2C).

Realtime Audio to Score Alignment for Polyphonic Music Instruments, using Sparse Non-Negative Constraints and Hierarchical HMMS

  • Arshia Cont
  • Computer Science
    2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings
  • 2006
The proposed algorithm has the advantage of having an explicit instrument model for pitch obtained through unsupervised learning as well as access to single note contribution probabilities which construct a complex chord instead of modeling the chord as one event.

A Conditional Random Field Framework for Robust and Scalable Audio-to-Score Matching

This paper introduces the use of conditional random fields (CRFs) for the audio-to-score alignment task and proposes a novel hierarchical approach, which takes advantage of the score structure for an approximate decoding of the statistical model.

A unified approach to real time audio-to-score and audio-to-audio alignment using sequential Montecarlo inference techniques

  • N. MontecchioArshia Cont
  • Computer Science
    2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2011
This paper presents a methodology for the real time alignment of music signals using sequential Montecarlo inference techniques, addressing both problems of audio-to-score and audio- to-audio alignment within the same framework in a real time setting.