Multi-Label Music Genre Classification from Audio, Text and Images Using Deep Features

  title={Multi-Label Music Genre Classification from Audio, Text and Images Using Deep Features},
  author={Sergio Oramas and Oriol Nieto and Francesco Barbieri and Xavier Serra},
  booktitle={International Society for Music Information Retrieval Conference},
Music genres allow to categorize musical items that share common characteristics. [] Key Method For every album we have collected the cover image, text reviews, and audio tracks. Additionally, we propose an approach for multi-label genre classification based on the combination of feature embeddings learned with state-of-the-art deep learning methodologies. Experiments show major differences between modalities, which not only introduce new baselines for multi-label genre classification, but also suggest that…

Figures and Tables from this paper

Multimodal Deep Learning for Music Genre Classification

An approach to learn and combine multimodal data representations for music genre classification is proposed, and a proposed approach for dimensionality reduction of target labels yields major improvements in multi-label classification.

A Music Genre Classification Method Based on Deep Learning

  • Qingzhe He
  • Computer Science
    Mathematical Problems in Engineering
  • 2022
This research investigates recurrent neural networks (RNN) and attention using the distinctive sequence of input MIDI segments and is validated when it is combined with the experimental accuracy of equal length segment categorization.

A Novel Approach to Music Genre Classification using Natural Language Processing and Spark

This paper attempts to use some of the highly sophisticated techniques, such as Hierarchical Attention Networks that exist within the text classification domain in order to classify tracks of different genres.

A Music Genre Classification Method Based on Deep Learning

  • Qi He
  • Computer Science
  • 2022
This research in-vestigates recurrent neural networks (RNN) and attention using the distinctive sequence of input MIDI segments and validated the method for music classification when it is combined with the experimental accuracy of equal length segment categorization.

Music Genre Classification using Deep learning - A review

The different music genre classification techniques were studied and implemented using CNN and shown to be quite effective at identifying trends and patterns from the vast collection of data.

The AcousticBrainz Genre Dataset: Multi-Source, Multi-Level, Multi-Label, and Large-Scale

The AcousticBrainz Genre Dataset allows researchers to explore how the same music pieces are annotated differently by different communities following their own genre taxonomies, and how this could be addressed by genre recognition systems.

Representation Learning vs. Handcrafted Features for Music Genre Classification

A comprehensive set of experiments aiming to perform music genre classification using learned and handcrafted features plus the fusion of them, which confirmed the power of non-handcrafted features to perform audio classification tasks.

A multimodal approach for multi-label movie genre classification

This is the most comprehensive study developed in terms of diversity of multimedia sources of information to perform movie genre classification, and corroborate the existence of complementarity among classifiers trained on different sources of Information in this field of application.

Machine learning for music genre: multifaceted review and experimentation with audioset

The main goal is to give the reader an overview of the history and the current state-of-the-art, exploring techniques and datasets used to the date, as well as identifying current challenges, such as this ambiguity of genre definitions or the introduction of human-centric approaches.

The AcousticBrainz Genre Dataset : Music Genre Recognition with Annotations from Multiple Sources

This paper introduces the AcousticBrainz Genre Dataset, a large-scale collection of hierarchical multi-label genre annotations from different metadata sources. It allows researchers to explore how



Enhancing multi-label music genre classification through ensemble techniques

This work proposes a set of ensemble techniques, which are specific to the task of multi-label genre classification, and investigates some existing ensemble techniques from machine learning.

Integration of Text and Audio Features for Genre Classification in Music Information Retrieval

The nature of text and audio feature sets which describe the same audio tracks are explained and the use of textual data on top of low level audio features for music genre classification is proposed.

Audio-based Music Classification with a Pretrained Convolutional Network

A convolutional network is built that is then trained to perform artist recognition, genre recognition and key detection, and it is found that the Convolutional approach improves accuracy for the genre Recognition and artist recognition tasks.

Tag Integrated Multi-Label Music Style Classification with Hypergraph

This paper proposes a multi-label music style classification approach, called Hypergraph integrated Support Vector Machine (HiSVM), which can integrate both music contents and music tags for automaticMusic style classification.

A Deep Multimodal Approach for Cold-start Music Recommendation

This work addresses the so-called cold-start problem by combining text and audio information with user feedback data using deep network architectures and suggests that both splitting the recommendation problem between feature levels, and merging feature embeddings in a multimodal approach improve the accuracy of the recommendations.

End-to-end learning for music audio

  • S. DielemanB. Schrauwen
  • Computer Science
    2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2014
Although convolutional neural networks do not outperform a spectrogram-based approach, the networks are able to autonomously discover frequency decompositions from raw audio, as well as phase-and translation-invariant feature representations.

Rhyme and Style Features for Musical Genre Classification by Song Lyrics

This paper presents a novel set of features developed for textual analysis of song lyrics, and combines them with and compares them to classical bag-of-words indexing approaches, and presents results for musical genre classification on a test collection in order to demonstrate the analysis.

Multimodal Music Mood Classification Using Audio and Lyrics

It is demonstrated that lyrics and audio information are complementary, and can be combined to improve a classification system, and integrating this in a multimodal system allows an improvement in the overall performance.

Deep content-based music recommendation

This paper proposes to use a latent factor model for recommendation, and predict the latent factors from music audio when they cannot be obtained from usage data, and shows that recent advances in deep learning translate very well to the music recommendation setting, with deep convolutional neural networks significantly outperforming the traditional approach.

What is this song about anyway?: Automatic classification of subject using user interpretations and lyrics

The results show that user-generated interpretations are significantly more useful than lyrics as classification features (p <; 0.05) and support the possibility of exploiting various existing sources for subject metadata enrichment in music digital libraries.