• Corpus ID: 228167512

Instrument Role Classification: Auto-tagging for Loop Based Music

  title={Instrument Role Classification: Auto-tagging for Loop Based Music},
  author={Joann Ching and Ant{\'o}nio Ramires and Yi-Hsuan Yang},
The proposed work introduces a new type of auto-tagging task, called “instrument role classification.” We discuss why the task is necessary, and further introduce a definition regarding loop based music. We introduce a new dataset for this task, the Freesound Loop Dataset, and benchmark the performance of both neural network and non-neural network based multi-label classification models for six instrument roles. 
1 Citations

Figures and Tables from this paper

A Benchmarking Initiative for Audio-Domain Music Generation Using the Freesound Loop Dataset
This paper proposes a new benchmark task for generating musical passages in the audio domain by using the drum loops from the FreeSound Loop Dataset, which are publicly re-distributable, and benchmark the performance of three recent deep generative adversarial network models the authors customize to generate loops, including StyleGAN, StyleGAN2, and UNAGAN.


Toward Interpretable Music Tagging with Self-Attention
Compared to conventional approaches using fully convolutional or recurrent neural networks, the proposed self-attention based deep sequence model for music tagging is more interpretable while reporting competitive results.
Multi-Label Classification of Music into Emotions
In this paper, the automated detection of emotion in music is modeled as a multilabel classification task, where a piece of music may belong to more than one class. Four algorithms are evaluated and
Automatic Tagging Using Deep Convolutional Neural Networks
The experiments show that mel-spectrogram is an effective time-frequency representation for automatic tagging and that more complex models benefit from more training data.
Sample-Level CNN Architectures for Music Auto-Tagging Using Raw Waveforms
This paper improves the 1-D CNN architecture for music auto-tagging by adopting building blocks from state-of-the-art image classification models, ResNets and SENets, and adding multi-level feature aggregation to it, and comparing different combinations of the modules in building CNN architectures.
Deep Salience Representations for F0 Estimation in Polyphonic Music
A fully convolutional neural network for learning salience representations for estimating fundamental frequencies, trained using a large, semi-automatically generated f0 dataset is described and shown to achieve state-of-the-art performance on several multi-f0 and melody datasets.
Timbre analysis of music audio signals with convolutional neural networks
One of the main goals of this work is to design efficient CNN architectures — what reduces the risk of these models to over-fit, since CNNs' number of parameters is minimized.
Data-Driven Harmonic Filters for Audio Representation Learning
Experimental results show that a simple convolutional neural network back-end with the proposed front-end outperforms state-of-the-art baseline methods in automatic music tagging, keyword spotting, and sound event tagging tasks.
End-to-end Learning for Music Audio Tagging at Scale
This work focuses on studying how waveform-based models outperform spectrogram-based ones in large-scale data scenarios when datasets of variable size are available for training, suggesting that music domain assumptions are relevant when not enough training data are available.
Neural Loop Combiner: Neural Network Models for Assessing the Compatibility of Loops
This work extracts loops from existing music to obtain positive examples of compatible loops, and proposes and compare various strategies for choosing negative examples, and investigates two types of model architectures for estimating the compatibility of loops based on a Siamese network and a pure convolutional neural network.
Designing efficient architectures for modeling temporal features with convolutional neural networks
  • Jordi Pons, X. Serra
  • Computer Science
    2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2017
A novel design strategy is proposed that might promote more expressive and intuitive deep learning architectures by efficiently exploiting the representational capacity of the first layer - using different filter shapes adapted to fit musical concepts within the first layers.