POLYPHONIC PITCH DETECTION WITH CONVOLUTIONAL RECURRENT NEURAL NETWORKS
@article{Thom2022POLYPHONICPD, title={POLYPHONIC PITCH DETECTION WITH CONVOLUTIONAL RECURRENT NEURAL NETWORKS}, author={Carl Thom{\'e} and Sven Ahlb{\"a}ck}, journal={ArXiv}, year={2022}, volume={abs/2202.02115} }
Recent directions in automatic speech recognition (ASR) research have shown that applying deep learning models from image recognition challenges in computer vision is beneficial. As automatic music transcription (AMT) is superficially similar to ASR, in the sense that methods often rely on transforming spectrograms to symbolic sequences of events (e.g. words or notes), deep learning should benefit AMT as well. In this work, we outline an online polyphonic pitch detection system that streams…
6 Citations
Onsets and Frames: Dual-Objective Piano Transcription
- Computer ScienceISMIR
- 2018
This work uses a deep convolutional and recurrent neural network to predict pitch onset events and then uses those predictions to condition framewise pitch predictions, which results in over a 100% relative improvement in note F1 score on the MAPS dataset.
Deep Polyphonic ADSR Piano Note Transcription
- Computer ScienceICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2019
A late-fusion approach to piano transcription, combined with a strong temporal prior in the form of a handcrafted Hidden Markov Model (HMM), which is able to outperform other approaches by a large margin, when predicting complete note regions from onsets to offsets.
End-to-End Music Transcription Using Fine-Tuned Variable-Q Filterbanks
- Computer Science
- 2019
This work replaces the time-frequency calculation step of a baseline transcription architecture with a learned equivalent, initialized with the frequency response of a Variable-Q Transform, and the resulting filterbanks are visualized and evaluated against the standard transform.
Pitch-Informed Instrument Assignment Using a Deep Convolutional Network with Multiple Kernel Shapes
- Computer ScienceISMIR
- 2021
A deep convolutional neural network for performing note-level instrument assignment given a polyphonic multi-instrumental music signal along with its ground truth or predicted notes and the effects of the use of multiple kernel shapes and comparing different input representations for the audio and the note-related information is proposed.
The melodic beat: exploring asymmetry in polska performance
- ArtJournal of Mathematics and Music
- 2021
Some triple-beat forms in Scandinavian Folk Music are characterized by non-isochronous beat durations: asymmetric beats. Theorists of folk music have suggested that the variability of rhythmic…
Improving Polyphonic Piano Transcription using Deep Residual Learning
- Computer Science
- 2019
In this thesis a new deep learning method is adapted for frame-wise polyphonic piano note transcription. It is based on the idea of Residual Learning which is then extended with Bidirectional Long…
References
SHOWING 1-10 OF 32 REFERENCES
Polyphonic piano note transcription with recurrent neural networks
- Computer Science2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2012
A new approach for polyphonic piano note onset transcription based on a recurrent neural network to simultaneously detect the onsets and the pitches of the notes from spectral features and generalizes much better than existing systems.
Deep Salience Representations for F0 Estimation in Polyphonic Music
- Computer ScienceISMIR
- 2017
A fully convolutional neural network for learning salience representations for estimating fundamental frequencies, trained using a large, semi-automatically generated f0 dataset is described and shown to achieve state-of-the-art performance on several multi-f0 and melody datasets.
Very deep convolutional networks for end-to-end speech recognition
- Computer Science2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2017
This work successively train very deep convolutional networks to add more expressive power and better generalization for end-to-end ASR models, and applies network-in-network principles, batch normalization, residual connections and convolutionAL LSTMs to build very deep recurrent and Convolutional structures.
An End-to-End Neural Network for Polyphonic Piano Music Transcription
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2016
An efficient variant of beam search is presented that improves performance and reduces run-times by an order of magnitude, making the model suitable for real-time applications.
Convolutional recurrent neural networks for music classification
- Computer Science2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2017
It is found that CRNN show a strong performance with respect to the number of parameter and training time, indicating the effectiveness of its hybrid structure in music feature extraction and feature summarisation.
Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin
- Computer ScienceICML
- 2016
It is shown that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech-two vastly different languages, and is competitive with the transcription of human workers when benchmarked on standard datasets.
A Shift-Invariant Latent Variable Model for Automatic Music Transcription
- Computer ScienceComputer Music Journal
- 2012
Results demonstrate that the proposed probabilistic model for multiple-instrument automatic music transcription outperforms leading approaches from the transcription literature, using several error metrics.
LSTM: A Search Space Odyssey
- Computer ScienceIEEE Transactions on Neural Networks and Learning Systems
- 2017
This paper presents the first large-scale analysis of eight LSTM variants on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling, and observes that the studied hyperparameters are virtually independent and derive guidelines for their efficient adjustment.
Sequence to Sequence Learning with Neural Networks
- Computer ScienceNIPS
- 2014
This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
On the Potential of Simple Framewise Approaches to Piano Transcription
- Computer ScienceISMIR
- 2016
It is shown that it is possible, by simple bottom-up frame-wise processing, to obtain a piano transcriber that outperforms the current published state of the art on the publicly available MAPS dataset -- without any complex post-processing steps.