Modeling Beats and Downbeats with a Time-Frequency Transformer
@article{Hung2022ModelingBA, title={Modeling Beats and Downbeats with a Time-Frequency Transformer}, author={Yun-Ning Hung and Ju-Chiang Wang and Xuchen Song and Weiyi Lu and Minz Won}, journal={ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, year={2022}, pages={401-405} }
Transformer is a successful deep neural network (DNN) architecture that has shown its versatility not only in natural language processing but also in music information retrieval (MIR). In this paper, we present a novel Transformer-based approach to tackle beat and downbeat tracking. This approach employs SpecTNT (Spectral- Temporal Transformer in Transformer), a variant of Transformer that models both spectral and temporal dimensions of a time-frequency input of music audio. A SpecTNT model…
6 Citations
Beat Transformer: Demixed Beat and Downbeat Tracking with Dilated Self-Attention
- Physics, Computer ScienceArXiv
- 2022
This work proposes Beat Transformer, a novel Transformer encoder architecture for joint beat and downbeat tracking that adopts a novel dilated self-attention mechanism, which achieves powerful hierarchical modelling with only linear complexity.
To catch a chorus, verse, intro, or anything else: Analyzing a song with structural functions
- Computer ScienceICASSP
- 2022
A multi-task deep learning framework to model these structural semantic labels directly from audio by estimating "verseness," "chorusness," and so forth, as a function of time is introduced.
An Analysis Method for Metric-Level Switching in Beat Tracking
- Computer ScienceIEEE Signal Processing Letters
- 2022
This letter proposes a new performance analysis method, called annotation coverage ratio (ACR), that accounts for a variety of possible metric-level switching behaviors of beat trackers and shows the usefulness of ACR when being utilized alongside existing metrics, and discusses the new insights that can be gained.
SSDPT: Self-Supervised Dual-Path Transformer for Anomalous Sound Detection in Machine Condition Monitoring
- Physics, Computer ScienceDigital Signal Processing
- 2023
Jointist: Joint Learning for Multi-instrument Transcription and Its Applications
- Computer ScienceArXiv
- 2022
Jointist is an instrument-aware multi-instrument framework that is capable of transcribing, recognizing, and separating multiple musical instruments from an audio clip and the symbolic representation provided by the transcription model turned out to be helpful to spectrograms in solving downbeat detection, chord recognition, and key estimation.
Jointist: Simultaneous Improvement of Multi-instrument Transcription and Music Source Separation via Joint Training
- Computer Science
- 2023
Jointist is introduced, an instrument-aware multi-instrument framework that is capable of transcribing, recognizing, and separating multiple musical instruments from an audio clip and achieves state-of-the-art performance on popular music, outperforming existing multi- instrument transcription models such as MT3.
References
SHOWING 1-10 OF 38 REFERENCES
SpecTNT: a Time-Frequency Transformer for Music Audio
- Computer ScienceISMIR
- 2021
A novel variant of the Transformer-inTransformer (TNT) architecture to model both spectral and temporal sequences of an input time-frequency representation, which demonstrates state-of-the-art performance in music tagging and vocal melody extraction, and shows competitive performance for chord recognition.
Analysis of Common Design Choices in Deep Learning Systems for Downbeat Tracking
- Computer ScienceISMIR
- 2018
A systematic investigation of the impact of largely adopted variants of convolutional-recurrent networks on downbeat tracking, and finds that temporal granularity has a significant impact on performance.
Robust Downbeat Tracking Using an Ensemble of Convolutional Networks
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2017
A novel state-of-the-art system for automatic downbeat tracking from music signals which takes advantage of the assumed metrical continuity of a song with significant increase in performance compared to the second-best system.
Joint Estimation of Chords and Downbeats From an Audio Signal
- Computer ScienceIEEE Transactions on Audio, Speech, and Language Processing
- 2011
The results show that the downbeat positions of a music piece can be estimated in terms of its harmonic structure and that conversely the chord progression estimation benefits from considering the interaction between the metric and the harmonic structures.
A Bi-Directional Transformer for Musical Chord Recognition
- Computer ScienceISMIR
- 2019
It turns out that the proposed bi-directional Transformer for chord recognition was able to divide segments of chords by utilizing adaptive receptive field of the attention mechanism, and it was observed that the model was ability to effectively capture long-term dependencies, making use of essential information regardless of distance.
A Music Structure Informed Downbeat Tracking System Using Skip-chain Conditional Random Fields and Deep Learning
- Computer ScienceICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2019
This work introduces a skip-chain conditional random field language model for downbeat tracking designed to include section information in an unified and flexible framework and shows that incorporating structure information in the language model leads to more consistent and more robust downbeat estimations.
Data-Driven Harmonic Filters for Audio Representation Learning
- Computer ScienceICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020
Experimental results show that a simple convolutional neural network back-end with the proposed front-end outperforms state-of-the-art baseline methods in automatic music tagging, keyword spotting, and sound event tagging tasks.
Joint Beat and Downbeat Tracking with Recurrent Neural Networks
- Computer ScienceISMIR
- 2016
A recurrent neural network operating directly on magnitude spectrograms is used to model the metrical structure of the audio signals at multiple levels and provides an output feature that clearly distinguishes between beats and downbeats.
Harmony Transformer: Incorporating Chord Segmentation into Harmony Recognition
- Computer ScienceISMIR
- 2019
The Harmony Transformer is proposed, a multi-task music harmony analysis model aiming to improve chord recognition through incorporating chord segmentation into the recognition process using end-to-end sequence learning.
Deconstruct, Analyse, Reconstruct: How to improve Tempo, Beat, and Downbeat Estimation
- Computer ScienceISMIR
- 2020
A novel multi-task approach for the simultaneous estimation of tempo, beat, and downbeat is devised, which seeks to embed more explicit musical knowledge into the design decisions in building the network.