Exploiting evidential theory in the fusion of textual, audio, and visual modalities for affective music video retrieval

@article{Nemati2017ExploitingET,
  title={Exploiting evidential theory in the fusion of textual, audio, and visual modalities for affective music video retrieval},
  author={Shahla Nemati and Ahmad Reza Naghsh-Nilchi},
  journal={2017 3rd International Conference on Pattern Recognition and Image Analysis (IPRIA)},
  year={2017},
  pages={222-228}
}
  • Shahla Nemati, A. Naghsh-Nilchi
  • Published 19 April 2017
  • Computer Science
  • 2017 3rd International Conference on Pattern Recognition and Image Analysis (IPRIA)
Developing techniques to retrieve video contents with regard to their impact on viewers' emotions is the main goal of affective video retrieval systems. Existing systems mainly apply a multimodal approach that fuses information from different modalities to specify the affect category. In this paper, the effect of exploiting two types of textual information to enrich the audio-visual content of music video is evaluated; subtitles or songs' lyrics and texts obtained from viewers' comments in… 

Figures from this paper

OWA Operators for the Fusion of Social Networks’ Comments with Audio-Visual Content
  • Shahla Nemati
  • Computer Science
    2019 5th International Conference on Web Research (ICWR)
  • 2019
TLDR
A new decision-level fusion approach based on Ordered Weighted Averaging (OWA) operators is proposed and it is shown that the proposed OWA-based method outperforms other methods in different fusion settings.
A Hybrid Latent Space Data Fusion Method for Multimodal Emotion Recognition
TLDR
A hybrid multimodal data fusion method is proposed in which the audio and visual modalities are fused using a latent space linear map and then, their projected features into the cross-modal space are fused with the textual modality using a Dempster-Shafer theory-based evidential fusion method.
Canonical Correlation Analysis for Data Fusion in Multimodal Emotion Recognition
  • Shahla Nemati
  • Computer Science
    2018 9th International Symposium on Telecommunications (IST)
  • 2018
TLDR
A hybrid method is proposed in the current study which first applies feature-level canonical correlation analysis (CCA) to audio and visual modalities and then combines the outputs with users’ comments using a decision-level fusion.
Predicting the Helpfulness Score of Product Reviews Using an Evidential Score Fusion Method
TLDR
Combining the features associated with emotions, features of VAD, and text-related features improves the accuracy of predicting the usefulness of reviews and an improved Dempster–Shafer score fusion algorithm is presented.
Affective computing in the context of music therapy: a systematic review
TLDR
A systematic review of the literature in the field of affective computing in the context of music therapy to assess AI methods to perform automatic emotion recognition applied to Human-Machine Musical Interfaces (HMMI).
Uninorm operators for sentence-level score aggregation in sentiment analysis
TLDR
A new sentence-level aggregation mechanism based on uninorm operators is proposed for aggregating sentence- level sentiment into an overall document-level opinion and implementation results show that the proposed method achieves a higher performance in polarity detection while the Dempster-Shafer method slightly outperforms the proposedmethod in score prediction task.
An improved evidence-based aggregation method for sentiment analysis
TLDR
A new method is proposed for scores aggregation that employs both the most and the second probable classes to predict the final score and is applied to review datasets of TripAdvisor and CitySearch which have been used in previous studies.
Improving Sentiment Polarity Detection Through Target Identification
TLDR
The results of comparing the proposed method with a state-of-the-art lexicon-based method show that specifying the main targets of reviews can improve the performance of the systems about 17% and 12% in terms of accuracy and F1-measure.
The effect of aggregation methods on sentiment classification in Persian reviews
TLDR
The results on four Persian review data sets show that the review-level aggregation can improve rating classification, although this approach does not have a positive impact on polarity classification.
An Evidential Model for Environmental Risk Assessment in Projects Using Dempster–Shafer Theory of Evidence
TLDR
An evidential model for project environmental risk assessment is proposed based on the Dempster–Shafer theory, which is capable of taking into account the uncertainties and has a high potential for project risk assessment under an uncertain environment.

References

SHOWING 1-10 OF 21 REFERENCES
Incorporating social media comments in affective video retrieval
TLDR
A new method for incorporating social media comments with the audio-visual contents of videos is proposed and for the combination stage a decision-level fusion method based on the Dempster–Shafer theory of evidence is presented.
Multimedia content analysis for emotional characterization of music video clips
TLDR
Using the proposed methodology, a relatively high performance (up to 90%) of affect recognition is obtained and several fusion techniques are used to combine the information extracted from the audio and video contents of music video clips.
Affective Visualization and Retrieval for Music Video
TLDR
A novel integrated system (i.MV) is proposed for personalized MV affective analysis, visualization, and retrieval and affective visualization is proved to be more suitable for affective information-based MV retrieval than the commonly used affective state representation strategies.
Determination of emotional content of video clips by low-level audiovisual features
TLDR
The results of the work on the determination of affective models for evaluation of video clips using audiovisual low-level features are described, with a finding that Arousal was the best detected dimension, followed by dominance and pleasure.
Hybrid video emotional tagging using users’ EEG and video content
TLDR
The proposed fusion methods outperform the conventional emotional tagging methods that use either video or EEG features alone in both valence and arousal spaces and narrow down the semantic gap between the low-level video features and the users’ high-level emotional tags with the help of EEG features.
A three-level framework for affective content analysis and its case studies
TLDR
A three-level affective content analysis framework is proposed by introducing mid-level representation to indicate dialog, audio emotional events and textual concepts to infer high-level Affective content.
Towards an intelligent framework for multimodal affective data analysis
Method and apparatus for summarizing a music video using content analysis
TLDR
Music Video (507) is segmented in the multimedia stream (505) by evaluating a plurality of content features of multimedia stream by evaluating the analysis of keywords obtained from transcript of at least one music video having at least two.
A Multimodal Database for Affect Recognition and Implicit Tagging
TLDR
Results show the potential uses of the recorded modalities and the significance of the emotion elicitation protocol and single modality and modality fusion results for both emotion recognition and implicit tagging experiments are reported.
...
...