Music-Video Emotion Analysis Using Late Fusion of Multimodal

@article{Pandeya2019MusicVideoEA,
  title={Music-Video Emotion Analysis Using Late Fusion of Multimodal},
  author={Yagya Raj Pandeya and Joonwhoan Lee},
  journal={DEStech Transactions on Computer Science and Engineering},
  year={2019}
}
Music-video emotion is a high-level semantics of human internal feeling through singing music lyrics, musical instrument performance and visual expression. Any online and off-line music video are rich source to analysis the emotion using modern machine learning technologies. In this research we make music-video emotion dataset and extract music and video features from pre-trained neural networks. Two pre-trained audio and video networks are first fine-tuned and then extract the low level and… 

Figures and Tables from this paper

Music Emotion Classification with Deep Neural Nets
TLDR
The proposed filter and channel separable convolution network reduce the neural network complexity and improve the evaluation results and the statistical and visual analysis shows improvement using the new convolution method and multi-channel audio.
Affective Computing for Large-scale Heterogeneous Multimedia Data
TLDR
This article surveys the state-of-the-art AC technologies comprehensively for large-scale heterogeneous multimedia data of different multimedia types, with the focus on both handcrafted features-based methods and deep learning methods.
Multimodal Fusion: A Review, Taxonomy, Open Challenges, Research Roadmap and Future Directions
TLDR
The present work tries to incorporate neutrosophic logic and its applications in the field of computer vision including multimodal data fusion and information systems to provide impetus to the existing research in this field.

References

SHOWING 1-10 OF 10 REFERENCES
Transfer Learning for Music Classification and Regression Tasks Using Artist Tags
TLDR
The experiment results show that the features learned using artist tags under the context of transfer learning are able to be effectively applied in music genre classification and music emotion recognition tasks.
Audio-visual emotion recognition using deep transfer learning and multiple temporal models
TLDR
This paper presents the techniques used in the contribution to Emotion Recognition in the Wild 2017 video based sub-challenge to classify the six basic emotions and neutral.
Video Affective Content Analysis: A Survey of State-of-the-Art Methods
TLDR
A general framework for video affective content analysis is proposed, which includes video content, emotional descriptors, and users' spontaneous nonverbal responses, as well as the relationships between the three.
Synchronous prediction of arousal and valence using LSTM network for affective video content analysis
  • Ligang Zhang, Jiulong Zhang
  • Computer Science
    2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)
  • 2017
TLDR
An approach to predict arousal and valance dimensions synchronously using the Long Short Term Memory (LSTM) network is presented, which provides one of the earliest preliminary evidence to the benefit of considering correlations between affective dimensions towards accurate AVCA.
DEAP: A Database for Emotion Analysis ;Using Physiological Signals
TLDR
A multimodal data set for the analysis of human affective states was presented and a novel method for stimuli selection is proposed using retrieval by affective tags from the last.fm website, video highlight detection, and an online assessment tool.
An improved valence-arousal emotion space for video affective content representation and recognition
TLDR
The experimental results show that the improved V-A emotion space can be used as a solution to represent and recognize video affective content.
Large-Scale Video Classification with Convolutional Neural Networks
TLDR
This work studies multiple approaches for extending the connectivity of a CNN in time domain to take advantage of local spatio-temporal information and suggests a multiresolution, foveated architecture as a promising way of speeding up the training.
Learning Spatiotemporal Features with 3D Convolutional Networks
TLDR
The learned features, namely C3D (Convolutional 3D), with a simple linear classifier outperform state-of-the-art methods on 4 different benchmarks and are comparable with current best methods on the other 2 benchmarks.
The Million Song Dataset
TLDR
The Million Song Dataset, a freely-available collection of audio features and metadata for a million contemporary popular music tracks, is introduced and positive results on year prediction are shown, and the future development of the dataset is discussed.
Self-report captures 27 distinct categories of emotion bridged by continuous gradients
TLDR
A conceptual framework to analyze reported emotional states elicited by 2,185 emotionally evocative short videos is introduced, examining the richest array of reported emotional experiences studied to date and the extent to which reported experiences of emotion are structured by discrete and dimensional geometries.