• Corpus ID: 215825159

Transfer Learning for Music Classification and Regression Tasks

  title={Transfer Learning for Music Classification and Regression Tasks},
  author={Keunwoo Choi and Gy{\"o}rgy Fazekas and Mark B. Sandler and Kyunghyun Cho},
  booktitle={International Society for Music Information Retrieval Conference},
In this paper, we present a transfer learning approach for music classification and regression tasks. We propose to use a pre-trained convnet feature, a concatenated feature vector using the activations of feature maps of multiple layers in a trained convolutional network. We show how this convnet feature can serve as general-purpose music representation. In the experiments, a convnet is trained for music tagging and then transferred to other music-related classification and regression tasks… 

Figures and Tables from this paper

Knowledge Transfer from Neural Networks for Speech Music Classification

This work investigates ways of mitigating the scarcity of suitable training data by applying transfer learning techniques to neural network architectures for several classification tasks from the field of Music Information Retrieval.

Representation Learning Using Artist labels for Audio Classification Tasks

In this work, we use a deep convolutional neural network (DCNN) trained with a public dataset, the Million Song Dataset, as a feature extractor. We trained the network from audio mel-spectrogram

Transfer Learning for Music Classification and Regression Tasks Using Artist Tags

The experiment results show that the features learned using artist tags under the context of transfer learning are able to be effectively applied in music genre classification and music emotion recognition tasks.

Representation Learning of Music Using Artist Labels

This paper presents a feature learning approach that utilizes artist labels attached in every single music track as an objective meta data and trains a deep convolutional neural network to classify audio tracks into a large number of artists.


This work proposes a novel approach for music genre and style recognition using an ensemble of convolutional neural network, Convolutional long short term memory network and a transfer learning model, which outperforms them and achieves new state of the art results.

Randomly Weighted CNNs for (Music) Audio Classification

  • Jordi PonsX. Serra
  • Computer Science
    ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2019
This work uses features extracted from the embeddings of deep architectures as input to a classifier – with the goal to compare classification accuracies when using different randomly weighted architectures.

MusicBERT - learning multi-modal representations for music and text

This work proposes building on a common framework of Transformer-based encoders for both text and music modalities using supervised and unsupervised methods for pre-training and finetuning to create a new class of models that are able to perform advanced tasks that span both NLP and music.

Random Projections of Mel-Spectrograms as Low-Level Features for Automatic Music Genre Classification

Tests in five different well-known, publicly available datasets show that random projections of Mel-spectrograms leads to results comparable to learned features and outperforms features obtained via transfer learning in a shallow learning scenario.

Audio-Based Music Classification with DenseNet And Data Augmentation

This is the first work to apply Densely Connected Convolutional Networks (DenseNet) to music audio tagging, which has been demonstrated to perform better than Residual neural network (ResNet) and the proposed combination of strong representation of DenseNet and data augmentation can be adapted to other audio processing tasks.

Music Genre Classification Using Transfer Learning

  • B. LiangMinwei Gu
  • Computer Science
    2020 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)
  • 2020
A transfer learning approach for audio-based classification of 11 western music genres, including Rock, Pop, Rap, Country, Folk, Metal, Jazz, Blues, R&B, Electronic Music and Classical Music, which can achieve 0.9799 ROC-A UC and 0.8938 PR-AUC on a private dataset of 1100 songs.



Convolutional recurrent neural networks for music classification

It is found that CRNN show a strong performance with respect to the number of parameter and training time, indicating the effectiveness of its hybrid structure in music feature extraction and feature summarisation.

Transfer Learning by Supervised Pre-training for Audio-based Music Classification

It is shown that features learned from MSD audio fragments in a supervised manner, using tag labels and user listening data, consistently outperform features learned in an unsupervised manner in this setting, provided that the learned feature extractor is of limited complexity.

Learning sparse dictionaries for music and speech classification

This approach removes the redundancy of using a separate classifier but also produces complete discrimination of music and speech on the GTZAN music/speech dataset by using the restricted dictionary size with limited computation.

Explaining Deep Convolutional Neural Networks on Music Classification

It is shown that in the deep layers of a 5-layer CNN, the features are learnt to capture textures, the patterns of continuous distributions, rather than shapes of lines.

Deep Image Features in Music Information Retrieval

The CNNs were applied to a Music Information Retrieval (MIR), in particular to musical genre recognition, and results achieved were close to the state-of-the-art.

On the Robustness of Deep Convolutional Neural Networks for Music Classification

It is shown that networks can be effective despite of relatively large error rates in groundtruth datasets, and it is subsequently shown that many commonly used input preprocessing techniques are redundant except magnitude compression.

Multi-Level and Multi-Scale Feature Aggregation Using Pretrained Convolutional Neural Networks for Music Auto-Tagging

The experiments show that using the combination of multi-level and multi-scale features is highly effective in music auto-tagging and the proposed method outperforms the previous state-of-the-art methods on the MagnaTagATune dataset and the Million Song Dataset.

Transfer Learning In Mir: Sharing Learned Latent Representations For Music Audio Classification And Similarity

The results show that shared representations can improve classification accuracy and it is shown how transfer learning can improve performance for music similarity.

The Effects of Noisy Labels on Deep Convolutional Neural Networks for Music Classification

It is shown that networks can be effective despite relatively large error rates in groundtruth datasets, while conjecturing that label noise can be the cause of varying tagwise performance differences.

The Effects of Noisy Labels on Deep Convolutional Neural Networks for Music Tagging

It is shown that networks can be effective despite relatively large error rates in groundtruth datasets, while conjecturing that label noise can be the cause of varying tag-wise performance differences.