A Shift-Invariant Latent Variable Model for Automatic Music Transcription


In this work, a probabilistic model for multiple-instrument automatic music transcription is proposed. The model extends the shift-invariant probabilistic latent component analysis method, which is used for spectrogram factorization. Proposed extensions support the use of multiple spectral templates per pitch and per instrument source, as well as a time-varying pitch contribution for each source. Thus, this method can effectively be used for multiple-instrument automatic transcription. In addition, the shift-invariant aspect of the method can be exploited for detecting tuning changes and frequency modulations, as well as for visualizing pitch content. For note tracking and smoothing, pitch-wise hidden Markov models are used. For training, pitch templates from eight orchestral instruments were extracted, covering their complete note range. The transcription system was tested on multiple-instrument polyphonic recordings from the RWC database, a Disklavier data set, and the MIREX 2007 multi-F0 data set. Results demonstrate that the proposed method outperforms leading approaches from the transcription literature, using several error metrics. Automatic music transcription refers to the process of converting musical audio, usually a recording, into some form of notation, e.g., sheet music, a MIDI file, or a " piano-roll " representation. It has applications in music information retrieval, computational musicology, and the creation of interactive music systems (e.g., real-time accompaniment, automatic instrument tutoring). The transcription problem can be separated into several subtasks, including multi-pitch estimation (which is considered to be the core problem of transcription), onset/offset detection, instrument identification, and rhythmic parsing. Although the problem of transcribing a monophonic recording is considered to be a solved problem in the literature, the creation of a transcription system able to handle polyphonic music produced by multiple instruments remains open. For reviews on multi-pitch detection and automatic transcription approaches, the reader is referred to de Cheveigné (2006) and Klapuri and Davy (2006). Approaches to transcription have used proba-bilistic methods (e. using spectrogram-factorization techniques have been proposed (e. The aim of these techniques is to decompose the input spectrogram into matrices denoting spectral templates and pitch activations. Transcription systems or pitch-tracking methods that use spectrogram-factorization models similar to the ones used in this article are detailed in the following section. Transcription approaches that use the same data sets used in this work include Poliner and Ellis (2007), where a piano-only transcription algorithm is proposed using support vector machines for note classification. For note smoothing, those authors fed the output of the classifier as input to a hidden …

DOI: 10.1162/COMJ_a_00146

Extracted Key Phrases

6 Figures and Tables

Showing 1-10 of 17 references

A Non-Negative Framework for Joint Modeling of Spectral Structure and Temporal Dynamics in Sound Mixtures

  • G Mysore
  • 2010
1 Excerpt

Constant-Q Transform Toolbox for Music Processing

  • C Schsch¨schörkhuber, A Klapuri
  • 2010
Showing 1-10 of 33 extracted citations
Citations per Year

Citation Velocity: 12

Averaging 12 citations per year over the last 3 years.

Learn more about how we calculate this metric in our FAQ.