Polyphonic piano note transcription with non-negative matrix factorization of differential spectrogram
Automatic music transcription is usually approached by using a time-frequency (TF) representation such as the short-time Fourier transform (STFT) spectrogram or the constant-Q transform. In this paper, we propose a novel yet simple TF representation that capitalizes the effectiveness of spectral flux features in highlighting note onset times. We refer to this representation as the differential spectrogram and investigate its usefulness for note-level piano transcription using two different non-negative matrix factorization (NMF) algorithms. Experiments on the MAPS ENSTDkCl dataset validate the advantages of the differential spectrogram over the STFT spectrogram for this task. Moreover, by adapting a state-of-the-art convolutional NMF algorithm with the differential spectrogram, we can achieve even better accuracy than the state-of-the-art on this dataset. Our analysis shows that the new representation suppresses unwanted TF patterns and performs particularly well in improving the recall rate.