More Cat than Cute?: Interpretable Prediction of Adjective-Noun Pairs

@article{Fernandez2017MoreCT,
  title={More Cat than Cute?: Interpretable Prediction of Adjective-Noun Pairs},
  author={Delia Fernandez and Alejandro Woodward and V{\'i}ctor Campos and Xavier Gir{\'o}-i-Nieto and Brendan Jou and Shih-Fu Chang},
  journal={Proceedings of the Workshop on Multimodal Understanding of Social, Affective and Subjective Attributes},
  year={2017}
}
  • Delia Fernandez, A. Woodward, Shih-Fu Chang
  • Published 21 August 2017
  • Computer Science
  • Proceedings of the Workshop on Multimodal Understanding of Social, Affective and Subjective Attributes
The increasing availability of affect-rich multimedia resources has bolstered interest in understanding sentiment and emotions in and from visual content. Adjective-noun pairs (ANP) are a popular mid-level semantic construct for capturing affect via visually detectable concepts such as "cute dog" or "beautiful landscape". Current state-of-the-art methods approach ANP prediction by considering each of these compound concepts as individual tokens, ignoring the underlying relationships in ANPs… 

Figures and Tables from this paper

A Survey on Deep Learning in Image Polarity Detection: Balancing Generalization Performances and Computational Costs

This paper analyses state-of-the-art literature on image polarity detection and identifies the most reliable CNN architectures and gives practical hints on the advantages and disadvantages of the examined architectures both in terms of generalization and computational cost.

Computational Approaches to Subjective Interpretation of Multimedia Messages

A way of modeling interpretation is described which allows for analyzing single or multiple ways of interpretation of both humans and computer models within the same theoretic framework and novel machine learning models for predicting subjective interpretations of images or tweets with images are developed.

Visual Sentiment Analysis Using Deep Learning Models with Social Media Data

The fine-tuned DenseNet-121 model outperformed the VGG-19 and ResNet50V2 models in image sentiment prediction and had an improved accuracy by about 5% to 10% compared to previous attempts at visual sentiment analysis.

Can a Pretrained Language Model Make Sense with Pretrained Neural Extractors? An Application to Multimodal Classification (short paper)

It is demonstrated that BERT can utilize textual inputs from different neural extractors in different formats, to obtain a performance improvement, and is used on the hate detection task of the Facebook hateful memes dataset and reports a decent performance.

References

SHOWING 1-10 OF 31 REFERENCES

Mapping Images to Sentiment Adjective Noun Pairs with Factorized Neural Nets

A novel factorized ANP CNN model which learns separate representations for adjectives and nouns but optimizes the classification performance over their product, which significantly outperforms not only independent ANP classifiers on unseen ANPs and on retrieving images of novel ANPs.

Large-scale visual sentiment ontology and detectors using adjective noun pairs

This work presents a method built upon psychological theories and web mining to automatically construct a large-scale Visual Sentiment Ontology (VSO) consisting of more than 3,000 Adjective Noun Pairs (ANP) and proposes SentiBank, a novel visual concept detector library that can be used to detect the presence of 1,200 ANPs in an image.

SentiBank: large-scale ontology and classifiers for detecting sentiment and emotions in visual content

A novel system which combines sound structures from psychology and the folksonomy extracted from social multimedia to develop a large visual sentiment ontology consisting of 1,200 concepts and associated classifiers called SentiBank, believed to offer a powerful mid-level semantic representation enabling high-level sentiment analysis of social multimedia.

Visual Affect Around the World: A Large-scale Multilingual Visual Sentiment Ontology

It is shown how this pipeline can be applied on a social multimedia platform for the creation of a large-scale multilingual visual sentiment concept ontology (MVSO), which is organized hierarchically by multilingual clusters of visually detectable nouns and subclusters of emotionally biased versions of these nouns.

Going Deeper for Multilingual Visual Sentiment Detection

Higher accuracy models for detecting ANPs across six languages from the same image pool and setting as in the original release using a more modern architecture, GoogLeNet, providing comparable or better performance with reduced network parameter cost is detailed.

Beyond Object Recognition: Visual Sentiment Analysis with Deep Coupled Adjective and Noun Neural Networks

This paper proposes a novel visual sentiment analysis approach with deep coupled adjective and noun neural networks that outperforms the state-of-the-art on SentiBank dataset with 10.2% accuracy gain and surpasses the previous best approach on Twitter dataset with clear margins.

Robust Visual-Textual Sentiment Analysis: When Attention meets Tree-structured Recursive Neural Networks

Results show that the proposed new framework that integrates textual and visual information for robust sentiment analysis outperforms existing the state-of-the-art joint models in sentiment analysis.

Predicting Emotions in User-Generated Videos

Results of a comprehensive set of experiments indicate that combining multiple types of features---such as the joint use of the audio and visual clues---is important, and attribute features such as those containing sentiment-level semantics are very effective.

DeepSentiBank: Visual Sentiment Concept Classification with Deep Convolutional Neural Networks

Performance evaluation shows the newly trained deep CNNs model SentiBank 2.0 (or called DeepSentiBank) is significantly improved in both annotation accuracy and retrieval performance, compared to its predecessors which mainly use binary SVM classification models.

Can we understand van gogh's mood?: learning to infer affects from images in social networks

A semi-supervised framework is proposed to formulate the problem of inferring affects from images in social networks into a factor graph model and the effectiveness of the proposed method is demonstrated on automatically understanding van Gogh's Mood from his artworks, and inferring the trend of public affects around special event.