Pop Music Highlighter: Marking the Emotion Keypoints

  title={Pop Music Highlighter: Marking the Emotion Keypoints},
  author={Yu-Siang Huang and Szu-Yu Chou and Yi-Hsuan Yang},
The goal of music highlight extraction, or thumbnailing, is to extract a short consecutive segment of a piece of music that is somehow representative of the whole piece. [] Key Method First, methodology-wise we experiment with a new architecture that does not need any recurrent layers, making the training process faster. Moreover, we compare a late-fusion variant and an early-fusion variant to study which one better exploits the attention mechanism. Second, we conduct and report an extensive set of…

Figures and Tables from this paper

Supervised Chorus Detection for Popular Music Using Convolutional Neural Network and Multi-Task Learning

This paper proposes to use a convolutional neural network with a multi-task learning objective, which simultaneously fits two temporal activation curves: one indicating "chorusness" as a function of time, and the location of the boundaries.

Deep Learning Based EDM Subgenre Classification using Mel-Spectrogram and Tempogram Features

The state-of-art music auto-tagging model “short-chunk CNN+Resnet” is extended to EDM subgenre classification, with the addition of two mid-level tempo-related feature representations, called the Fourier tempogram and autocorrelation tempogram, and two fusion strategies are explored to aggregate the two types of tempograms.

Deepchorus: A Hybrid Model of Multi-Scale Convolution And Self-Attention for Chorus Detection

An end-to-end chorus detection model DeepChorus is proposed, reducing the engineering effort and the need for prior knowledge and outperforms existing state-of-the-art methods in most cases.

Research on Musical Sentiment Classification Model Based on Joint Representation Structure

A neural network model based on joint representation structure that uses low-level descriptors and spectrograms to construct a joint representation of the characteristics of the manual and convolutional recurrent neural network can improve the classification accuracy of music emotions compared with the traditional single model.

Semantic Tagging of Singing Voices in Popular Music Recordings

This article builds a music tag dataset dedicated to singing voice, defines a vocabulary that describes timbre and singing styles of K-pop vocalists and collects human annotations for individual tracks, and conducts statistical analysis to understand the global and temporal characteristics of the tag words.

A Minimal Template for Interactive Web-based Demonstrations of Musical Machine Learning

This paper presents an template that is specifically designed to demonstrate symbolic musical machine learning models on the web and shows that the built-in interactivity and real-time audio rendering of the browser make the demonstration easier to understand and to play with.

Editorial: Introducing the Transactions of the International Society for Music Information Retrieval

The Transactions of the International Society for Music Information Retrieval (TISMIR) publishes novel scientific research in the field of music information retrieval (MIR), an interdisciplinary

Multi-Modal Chorus Recognition for Improving Song Search

A multi-modal Chorus Recognition model is proposed that considers diverse features of songs by utilizing both the lyrics and the tune information of songs and helps to improve the accuracy of its downstream task song search.

Emotion Recognition of Violin Playing Based on Big Data Analysis Technologies

  • Liangjun Zou
  • Computer Science
    Journal of environmental and public health
  • 2022
An emotion recognition method for dynamic violin performances based on LSTM, which selects acoustic features and classifies the audio acoustic signals contained in the violin performances and can greatly reduce the training time and improve the prediction accuracy, higher than the existing methods.



Music thumbnailing via neural attention modeling of music emotion

This short paper introduces a so-called attention layer with long-short term memory cells to a deep convolutional neural network for music emotion classification, and investigates whether a representative part selected by some automatic mechanism for emotion recognition corresponds to the chorus section of music.

Automatic Music Highlight Extraction using Convolutional Recurrent Attention Networks

Experimental results show the proposed method for extracting highlights using high-level features from convolutional recurrent attention networks (CRAN) outperforms three baseline methods through quantitative and qualitative evaluations.

Novel Audio Features for Music Emotion Recognition

This work advances the music emotion recognition state-of-the-art by proposing novel emotionally-relevant audio features related with musical texture and expressive techniques, and analysing the features relevance and results uncovered interesting relations.

Event Localization in Music Auto-tagging

This paper proposes a convolutional neural network architecture that is able to make accurate frame-level predictions of tags in unseen music clips by using only clip-level annotations in the training phase, and presents qualitative analyses showing the model can indeed learn certain characteristics of music tags.

Toward Automatic Music Audio Summary Generation from Signal Analysis

This paper deals with the automatic generation of music audio summaries from signal analysis without the use of any other information by considering the audio signal as a succession of “states” corresponding to the structure of a piece of music.

An Analysis of Chorus Features in Popular Song

This computational analysis compiles a list of robust and interpretable features and models their influence on the ‘chorusness’ of a collection of song sections from the Billboard dataset and shows that timbre and timbre variety are more strongly related to chorus qualities than harmony and absolute pitch height.

Deep content-based music recommendation

This paper proposes to use a latent factor model for recommendation, and predict the latent factors from music audio when they cannot be obtained from usage data, and shows that recent advances in deep learning translate very well to the music recommendation setting, with deep convolutional neural networks significantly outperforming the traditional approach.

A Tutorial on Deep Learning for Music Information Retrieval

The basic principles and prominent works in deep learning for MIR are laid out and the network structures that have been successful in MIR problems are outlined to facilitate the selection of building blocks for the problems at hand.

Enhance popular music emotion regression by importing structure information

Music structure information is imported to improve the result for music emotion regression, and results show that structure information can help improve emotion regression.

End-to-end learning for music audio

  • S. DielemanB. Schrauwen
  • Computer Science
    2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2014
Although convolutional neural networks do not outperform a spectrogram-based approach, the networks are able to autonomously discover frequency decompositions from raw audio, as well as phase-and translation-invariant feature representations.