Why We Watch the News: A Dataset for Exploring Sentiment in Broadcast Video News

  title={Why We Watch the News: A Dataset for Exploring Sentiment in Broadcast Video News},
  author={Joseph G. Ellis and Brendan Jou and Shih-Fu Chang},
  journal={Proceedings of the 16th International Conference on Multimodal Interaction},
We present a multimodal sentiment study performed on a novel collection of videos mined from broadcast and cable television news programs. To the best of our knowledge, this is the first dataset released for studying sentiment in the domain of broadcast video news. We describe our algorithm for the processing and creation of person-specific segments from news video, yielding 929 sentence-length videos, and are annotated via Amazon Mechanical Turk. The spoken transcript and the video content… 

Figures and Tables from this paper

Good News vs. Bad News: What are they talking about?
This paper developed a corpus of Ukrainian and Russian news and annotated each text using one of three categories: positive, negative and neutral, which investigated what kinds of named entities are perceived as good or bad by the readers and which of them were the cause for text annotation ambiguity.
Multimodal approach for tension levels estimation in news videos
A novel multimodal approach to estimate tension levels in news videos that combines audio and visual cues extracted from news participants, as well as textual cues obtained from the sentiment analysis of the speech transcriptions, demonstrating the high potential of this approach to be used by media analysts in several applications, especially, in the journalistic domain.
Watching What and How Politicians Discuss Various Topics: A Large-Scale Video Analytics UI
A large-scale dataset comprised of videos of politicians speaking organized by the topics they are speaking about, and a user interface for exploring this interesting dataset, which is unique by direct linking to actual speaking by politicians about specific topics, rather than links to textual quotes only.
A survey of multimodal sentiment analysis
Multimodal Sentiment Analysis: A Comparison Study
This paper focuses on multimodal sentiment analysis as text, audio and video, by giving a complete image of it and related dataset available and providing brief details for each type, in addition to that present the recent trend of researches in the multimodAL sentiment analysis and its related fields will be explored.
Multimodal Deep Learning Framework for Sentiment Analysis from Text-Image Web Data
A sentiment analysis framework that carefully fuses the salient visual cues and high attention textual cues is proposed, exploiting the interrelationships between multimodal web data.
Segmentation and Classification of Opinions with Recurrent Neural Networks
A recurrent neural network model with bi-directional LSTM-RNN, to perform joint segmentation and classification of opinions in text is proposed and a novel method to train neural networks for segmentation tasks is introduced.
A Comprehensive Review of Visual-Textual Sentiment Analysis from Social Media Networks
An introduction of the data fusion strategies and a summary of existing research on visual–textual challenges and several important investigated to reduce the dimensionality of the high features are introduced.
Attention-based Multi-modal Sentiment Analysis and Emotion Detection in Conversation using RNN
The recurrent neural network (RNN) based method is developed to capture the interlocutor state and contextual state between the utterances and the results demonstrate that the proposed model performs better than the standard baselines.
Multi-level context extraction and attention-based contextual inter-modal fusion for multimodal sentiment analysis and emotion classification
This article presents a novel approach to extract the context at multiple levels and to understand the importance of inter-modal utterances in sentiment and emotion classification and outperforms the standard baselines by over 3% in classification accuracy.


Towards multimodal sentiment analysis: harvesting opinions from the web
This paper addresses the task of multimodal sentiment analysis, and conducts proof-of-concept experiments that demonstrate that a joint model that integrates visual, audio, and textual features can be effectively used to identify sentiment in Web videos.
YouTube Movie Reviews: Sentiment Analysis in an Audio-Visual Context
Experimental results indicate that training on written movie reviews is a promising alternative to exclusively using (spoken) in-domain data for building a system that analyzes spoken movie review videos, and that language-independent audio-visual analysis can compete with linguistic analysis.
Structured exploration of who, what, when, and where in heterogeneous multimedia news sources
A fully automatic system from raw data gathering to navigation over heterogeneous news sources, able to extract and study the trend of topics in the news and detect interesting peaks in news coverage over the life of the topic is presented.
Utterance-Level Multimodal Sentiment Analysis
It is shown that multimodal sentiment analysis can be effectively performed, and that the joint use of visual, acoustic, and linguistic modalities can lead to error rate reductions of up to 10.5% as compared to the best performing individual modality.
Predicting movie ratings from audience behaviors
We propose a method of representing audience behavior through facial and body motions from a single video stream, and use these features to predict the rating for feature-length movies. This is a
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
A Sentiment Treebank that includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality, and introduces the Recursive Neural Tensor Network.
Taking the bite out of automated naming of characters in TV video
39. Opinion mining and sentiment analysis
This chapter introduces an idealised, end-to-end opinion analysis system and describes its components, including constructing opinion lexica, performing sentiment analysis, and producing opinion summaries.
AVEC 2011-The First International Audio/Visual Emotion Challenge
The Audio/Visual Emotion Challenge and Workshop (AVEC 2011) is the first competition event aimed at comparison of multimedia processing and machine learning methods for automatic audio, visual and
Predicting online media effectiveness based on smile responses gathered over the Internet
An automated method based on over 1,500 facial responses to media collected over the Internet shows the possibility for an ecologically valid, unobtrusive, evaluation of commercial “liking” and “desire to view again”, strong predictors of marketing success, based only on facial responses.