Score Transformer: Generating Musical Score from Note-level Representation

@article{Suzuki2021ScoreTG,
  title={Score Transformer: Generating Musical Score from Note-level Representation},
  author={Masahiro Suzuki},
  journal={ACM Multimedia Asia},
  year={2021}
}
  • Masahiro Suzuki
  • Published 1 December 2021
  • Computer Science
  • ACM Multimedia Asia
In this paper, we explore the tokenized representation of musical scores using the Transformer model to automatically generate musical scores. Thus far, sequence models have yielded fruitful results with note-level (MIDI-equivalent) symbolic representations of music. Although the note-level representations can comprise sufficient information to reproduce music aurally, they cannot contain adequate information to represent music visually in terms of notation. Musical scores contain various… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 26 REFERENCES
Pop Music Transformer: Generating Music with Rhythm and Harmony
TLDR
This paper builds a Pop Music Transformer that composes Pop piano music with a more plausible rhythmic structure than prior arts do and introduces a new event set, dubbed "REMI" (REvamped MIDI-derived events), which provides sequence models a metric context for modeling the rhythmic patterns of music.
Transcribing Human Piano Performances into Music Notation
TLDR
This paper presents a system that generates music notation output from human-recorded MIDI performances of piano music and shows that the correct estimation of the meter, harmony and streams in a piano performance provides a solid foundation to produce a properly formatted score.
A Metric for Music Notation Transcription Accuracy
TLDR
This paper proposes an edit distance, similar to the Levenshtein Distance used for measuring the difference between two sequences, typically strings of characters, and applies a linear regression model to the metric in order to predict human evaluations on a dataset of short music excerpts automatically transcribed into music notation.
Music Transformer: Generating Music with Long-Term Structure
TLDR
It is demonstrated that a Transformer with the modified relative attention mechanism can generate minutelong compositions with compelling structure, generate continuations that coherently elaborate on a given motif, and in a seq2seq setup generate accompaniments conditioned on melodies.
A Holistic Approach to Polyphonic Music Transcription with Neural Networks
TLDR
Results show that this model can learn to transcribe scores directly from audio signals, opening a promising avenue towards complete AMT.
Joint Multi-Pitch Detection and Score Transcription for Polyphonic Piano Music
TLDR
This paper proposes a method for joint multi-pitch detection and score transcription for polyphonic piano music, and proposes a Reshaped score representation that outperforms a LilyPond representation in terms of both prediction accuracy and time/memory resources.
Compound Word Transformer: Learning to Compose Full-Song Music over Dynamic Directed Hypergraphs
TLDR
This paper presents a conceptually different approach that explicitly takes into account the type of the tokens, such as note types and metric types, and proposes a new Transformer decoder architecture that uses different feed-forward heads to model tokens of different types.
Towards Complete Polyphonic Music Transcription: Integrating Multi-Pitch Detection and Rhythm Quantization
TLDR
Systematic evaluations on commonly used classical piano data show that these treatments of polyphonic transcription improve the performance of transcription, which can be used as benchmarks for further studies.
An End-to-end Framework for Audio-to-Score Music Transcription on Monophonic Excerpts
TLDR
This is the first automatic music transcription approach which obtains directly a symbolic score from audio, instead of performing separate stages for piano-roll estimation, pitch detection and note tracking, meter detection or key estimation.
MMM : Exploring Conditional Multi-Track Music Generation with the Transformer
TLDR
This work creates a time-ordered sequence of musical events for each track and concatenate several tracks into a single sequence, taking advantage of the Transformer's attention-mechanism, which can adeptly handle long-term dependencies.
...
1
2
3
...