Automatic Understanding of Image and Video Advertisements

@article{Hussain2017AutomaticUO,
  title={Automatic Understanding of Image and Video Advertisements},
  author={Zaeem Hussain and Mingda Zhang and Xiaozhong Zhang and Keren Ye and Christopher Thomas and Zuha Agha and Nathan Ong and Adriana Kovashka},
  journal={2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2017},
  pages={1100-1110}
}
There is more to images than their objective physical content: for example, advertisements are created to persuade a viewer to take a certain action. We propose the novel problem of automatic advertisement understanding. To enable research on this problem, we create two datasets: an image dataset of 64,832 image ads, and a video dataset of 3,477 ads. Our data contains rich annotations encompassing the topic and sentiment of the ads, questions and answers describing what actions the viewer is… 

Figures and Tables from this paper

Interpreting the Rhetoric of Visual Advertisements
TLDR
This work proposes a suite of data and techniques that enable progress on understanding the messages that visual advertisements convey, and develops methods that use multimodal cues, i.e., both visuals and slogans, for both the image and video domains.
Content-based Effectiveness Prediction of Video Advertisements
TLDR
This paper proposes a multi-modal mixture based algorithm to predict the effectiveness of video ads automatically and exploits rich textual information often found with an advertisement as well as visual information to learn a finite mixture model.
Content-Based Effectiveness Prediction of Video Advertisements
TLDR
This paper proposes a multi-modal mixture based algorithm to predict the effectiveness of video ads automatically and exploits rich textual information often found with an advertisement as well as visual information to learn a finite mixture model.
Automatic synthesis of advertising images according to a specified style
TLDR
The proposed data-driven method to capture individual design attributes and the relationships between elements in advertising images with the aim of automatically synthesizing the input of elements into an advertising image according to a specified style improved users’ satisfaction by 7.1% compared to designs generated by nonprofessional students.
Symbolic VQA on Visual Advertisements with SymViSe Networks
TLDR
This work adapts a popular VQA architecture to implement a 2-stream ViSe (Visual Semantic) Network, which takes an image and candidate action-reason statement and predicts a probability that the two match; this helps demonstrate the difficulty of the task and serves as a baseline of existing techniques.
Look, Read and Feel: Benchmarking Ads Understanding with Multimodal Multitask Learning
TLDR
A novel deep multimodal multitask framework that integrates multiple modalities to achieve effective topic and sentiment prediction simultaneously for ads understanding is developed and achieved state-of-the-art performance for both prediction tasks.
Story Understanding in Video Advertisements
TLDR
This study first crowdsource climax annotations on 1,149 videos from the Video Ads Dataset, then uses both unsupervised and supervised methods to predict the climax and builds a sentiment prediction model that outperforms the current state-of-the-art model of sentiment prediction in video ads by 25%.
ADVISE: Symbolism and External Knowledge for Decoding Advertisements
TLDR
This work forms the ad understanding task as matching the ad image to human-generated statements that describe the action that the ad prompts, and the rationale it provides for taking this action, and proposes a method that outperforms the state of the art on this task.
Learning Subjective Attributes of Images from Auxiliary Sources
TLDR
This work proposes a probabilistic learning framework capable of transferring subjective information to the image-level labels based on a known aggregated distribution and uses this framework to rank images by subjective attributes from the domain knowledge of social media marketing and personality psychology.
Persuasive Faces: Generating Faces in Advertisements
TLDR
This paper proposes a conditional variational autoencoder which makes use of predicted semantic attributes and facial expressions as a supervisory signal when training and shows how this model can be used to produce visually distinct faces which appear to be from a fixed ad topic category.
...
...

References

SHOWING 1-10 OF 89 REFERENCES
ImageSense: Towards contextual image advertising
TLDR
A contextual advertising system driven by images is presented, which automatically associates relevant ads with an image rather than the entire text in a Web page and seamlessly inserts the ads in the nonintrusive areas within each individual image.
Visual Persuasion: Inferring Communicative Intents of Images
TLDR
This study demonstrates that a systematic focus on visual persuasion opens up the field of computer vision to a new class of investigations around mediated images, intersecting with media analysis, psychology, and political communication.
Every Picture Tells a Story: Generating Sentences from Images
TLDR
A system that can compute a score linking an image to a sentence, which can be used to attach a descriptive sentence to a given image, or to obtain images that illustrate a given sentence.
Affective image classification using features inspired by psychology and art theory
TLDR
This work investigates and develops methods to extract and combine low-level features that represent the emotional content of an image, and uses these for image emotion classification.
Recognizing Image Style
TLDR
An approach to predicting style of images, and a thorough evaluation of different image features for these tasks, find that features learned in a multi-layer network generally perform best -- even when trained with object class (not style) labels.
Visual appearance of display ads and its effect on click through rate
TLDR
This paper quantitatively study the relationship between the visual appearance and performance of creatives using large scale data in the world's largest display ads exchange system, RightMedia, and designs a set of 43 visual features, some of which are novel and others are inspired by related work.
Show and tell: A neural image caption generator
TLDR
This paper presents a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image.
Predicting Ad Liking and Purchase Intent: Large-Scale Analysis of Facial Responses to Ads
TLDR
A large-scale analysis of facial responses to video content measured over the Internet and their relationship to marketing effectiveness demonstrates a reliable and generalizable system for predicting ad effectiveness automatically from facial responses without a need to elicit self-report responses from the viewers.
CAVVA: Computational Affective Video-in-Video Advertising
TLDR
It is demonstrated that CAVVA achieves a good balance between the following seemingly conflicting goals of minimizing the user disturbance because of advertisement insertion while (b) enhancing the user engagement with the advertising content.
WhittleSearch: Interactive Image Search with Relative Attribute Feedback
TLDR
A novel mode of feedback for image search, where a user describes which properties of exemplar images should be adjusted in order to more closely match his/her mental model of the image sought, which outperforms traditional binary relevance feedback in terms of search speed and accuracy.
...
...