YouTube Movie Reviews: Sentiment Analysis in an Audio-Visual Context

  title={YouTube Movie Reviews: Sentiment Analysis in an Audio-Visual Context},
  author={Martin W{\"o}llmer and Felix Weninger and Tobias Knaup and Bj{\"o}rn Schuller and Congkai Sun and Kenji Sagae and Louis-Philippe Morency},
  journal={IEEE Intelligent Systems},
This work focuses on automatically analyzing a speaker's sentiment in online videos containing movie reviews. In addition to textual information, this approach considers adding audio features as typically used in speech-based emotion recognition as well as video features encoding valuable valence information conveyed by the speaker. Experimental results indicate that training on written movie reviews is a promising alternative to exclusively using (spoken) in-domain data for building a system… 

Figures and Tables from this paper

MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos
This paper introduces to the scientific community the first opinion-level annotated corpus of sentiment and subjectivity analysis in online videos called Multimodal Opinion-level Sentiment Intensity dataset (MOSI), which is rigorously annotated with labels for subjectivity, sentiment intensity, per-frame and per-opinion annotated visual features, andper-milliseconds annotated audio features.
Importance Evaluation of Movie Aspects: Aspect-Based Sentiment Analysis
This investigation showed that reviews on the plot have the most significant impact, slightly higher than those of actors, and gives the result that a good plot is a proxy for a movie to succeed.
Survey on Sentiment Analysis from Affective Multimodal Content
This paper presents the overview of different techniques and approaches in sentiment analysis for text, audio, and visual modalities for text and emotion recognition from audio and visual.
Context-Dependent Sentiment Analysis in User-Generated Videos
A LSTM-based model is proposed that enables utterances to capture contextual information from their surroundings in the same video, thus aiding the classification process and showing 5-10% performance improvement over the state of the art and high robustness to generalizability.
Multimodal sentimental analysis for social media applications: A comprehensive review
This work aims to present a survey of recent developments in analyzing the multimodal sentiments (involving text, audio, and video/image) which involve human–machine interaction and challenges involved in analyzing them.
Benchmarking Multimodal Sentiment Analysis
We propose a framework for multimodal sentiment analysis and emotion recognition using convolutional neural network-based feature extraction from text and visual modalities. We obtain a performance
Multimodal Sentiment Analysis: A Comparison Study
This paper focuses on multimodal sentiment analysis as text, audio and video, by giving a complete image of it and related dataset available and providing brief details for each type, in addition to that present the recent trend of researches in the multimodAL sentiment analysis and its related fields will be explored.
Opinion Dynamics Modeling for Movie Review Transcripts Classification with Hidden Conditional Random Fields
The main goal is to detect a movie reviewer's opinion using hidden conditional random fields and adapt the word embedding model learned from general written texts data to spoken movie reviews and thus model the dynamics of the opinion.
Enhanced Video Analytics for Sentiment Analysis Based on Fusing Textual, Auditory and Visual Information
The proposed approach of combining different modalities can lead to more accurate prediction of speaker’s sentiment with above 94% accuracy and the effectiveness of various combinations of modalities is verified using multi-level fusion (feature, score and decision).


Towards multimodal sentiment analysis: harvesting opinions from the web
This paper addresses the task of multimodal sentiment analysis, and conducts proof-of-concept experiments that demonstrate that a joint model that integrates visual, audio, and textual features can be effectively used to identify sentiment in Web videos.
A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts
A novel machine-learning method is proposed that applies text-categorization techniques to just the subjective portions of the document, which greatly facilitates incorporation of cross-sentence contextual constraints.
MovieClouds: content-based overviews and exploratory browsing of movies
This paper presents and evaluates MovieClouds, an interactive web application designed to access, explore and visualize movies based on the information conveyed in the different tracks or perspectives of its content, with a special focus on the emotional dimensions expressed in the movies or felt by the viewers.
Sentic Computing for social media marketing
This work uses Sentic Computing, a multi-disciplinary approach to opinion mining and sentiment analysis, to semantically and affectively analyze text and encode results in a semantic aware format according to different web ontologies to represent this information as an interconnected knowledge base.
A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions
Automated analysis of human affective behavior has attracted increasing attention from researchers in psychology, computer science, linguistics, neuroscience, and related disciplines. However, the
Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification
This work extends to sentiment classification the recently-proposed structural correspondence learning (SCL) algorithm, reducing the relative error due to adaptation between domains by an average of 30% over the original SCL algorithm and 46% over a supervised baseline.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews
A simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (Thumbs down) if the average semantic orientation of its phrases is positive.
Multilingual Subjectivity: Are More Languages Better?
This paper explores the integration of features originating from multiple languages into a machine learning approach to subjectivity analysis, and aims to show that this enriched feature set provides for more effective modeling for the source as well as the target languages.
Opensmile: the munich versatile and fast open-source audio feature extractor
The openSMILE feature extraction toolkit is introduced, which unites feature extraction algorithms from the speech processing and the Music Information Retrieval communities and has a modular, component based architecture which makes extensions via plug-ins easy.