Graph Neural Networks for Image Understanding Based on Multiple Cues: Group Emotion Recognition and Event Recognition as Use Cases

@article{Guo2020GraphNN,
  title={Graph Neural Networks for Image Understanding Based on Multiple Cues: Group Emotion Recognition and Event Recognition as Use Cases},
  author={Xin Guo and Luisa F. Polan{\'i}a and Bin Zhu and Charles G. Boncelet and Kenneth E. Barner},
  journal={2020 IEEE Winter Conference on Applications of Computer Vision (WACV)},
  year={2020},
  pages={2910-2919}
}
  • Xin Guo, L. Polanía, +2 authors K. Barner
  • Published 2020
  • Computer Science
  • 2020 IEEE Winter Conference on Applications of Computer Vision (WACV)
A graph neural network (GNN) for image understanding based on multiple cues is proposed in this paper. Compared to traditional feature and decision fusion approaches that neglect the fact that features can interact and exchange information, the proposed GNN is able to pass information among features extracted from different models. Two image understanding tasks, namely group-level emotion recognition (GER) and event recognition, which are highly semantic and require the interaction of several… Expand
Regional Attention Networks with Context-aware Fusion for Group Emotion Recognition
TLDR
This work developed a regional attention mechanism to find important persons or objects, which play critical roles in the group emotion, and combine them based on importance, and proposed a context-aware fusion mechanism to estimate weights from the image context to fuse different sources of information. Expand
APSE: Attention-aware Polarity-Sensitive Embedding for Emotion-based Image Retrieval
TLDR
An attention-aware polarity-sensitive embedding (APSE) network is designed that outperforms the state-of-the-art EBIR approaches by a large margin and develops a hierarchical attention mechanism to automatically discover and model the informative regions of interest. Expand
PETA: Photo Albums Event Recognition using Transformers Attention
TLDR
A tailor-made solution, combining the power of CNNs for image representation and transformers for album representation to perform global reasoning on image collection, offering a practical and efficient solution for photo albums event recognition. Expand
Learning Furniture Compatibility with Graph Neural Networks
TLDR
This work proposes a graph neural network (GNN) approach to the problem of predicting the stylistic compatibility of a set of furniture items from images, and introduces a new dataset, called the Target Furniture Collections dataset, which contains over 6000 furniture items that have been hand-curated by stylists to make up 1632 compatible sets. Expand
Machine learning for video event recognition
TLDR
This survey starts by providing the formal definitions of both scene and event, and the logical architecture for a generic event recognition system, and presents two taxonomies based on features and machine learning algorithms, respectively, which are used to describe the different approaches for the recognition of events within a video sequence. Expand
Ontology-driven Event Type Classification in Images
TLDR
This paper uses a large number of real-world news events to create an ontology based on Wikidata comprising the majority of event types and introduces a novel large-scale dataset that was acquired through Web crawling. Expand

References

SHOWING 1-10 OF 69 REFERENCES
A new deep-learning framework for group emotion recognition
In this paper, we target the Group-level emotion recognition sub-challenge of the fifth Emotion Recognition in the Wild (EmotiW 2017) Challenge, which is based on the Group Affect Database 2.0Expand
Group-Level Emotion Recognition Using Hybrid Deep Models Based on Faces, Scenes, Skeletons and Visual Attentions
TLDR
This paper presents a hybrid deep learning network submitted to the 6th Emotion Recognition in the Wild (EmotiW 2018) Grand Challenge, in the category of group-level emotion recognition, and achieves the first place in the challenge. Expand
Group-Level Emotion Recognition using Deep Models with A Four-stream Hybrid Network
TLDR
A novel face-location aware global network is proposed, capturing the face location information in the form of an attention heatmap to better model such relationships between faces and scene in a global image. Expand
Group-level emotion recognition using deep models on image scene, faces, and skeletons
TLDR
A hybrid network that incorporates global scene features, skeleton features of the group, and local facial features is developed and achieves outperforming the baseline of 52.97% and 53.62% on the validation and testing sets. Expand
Cascade Attention Networks For Group Emotion Recognition with Face, Body and Image Cues
TLDR
This paper presents the approach for group-level emotion recognition sub-challenge in the EmotiW 2018 and proposes a cascade attention network for the face cue in images to generate a global representation based on all faces. Expand
LSTM for dynamic emotion and group emotion recognition in the wild
TLDR
This paper extracts acoustic features, LBPTOP, Dense SIFT and CNN-LSTM features to recognize the emotions of film characters, and uses a fusion network to combine all the extracted features at the decision level for group level emotion recognition sub-challenge. Expand
Social Relationship Recognition Based on A Hybrid Deep Neural Network
TLDR
A hybrid deep network is proposed to predict the social relations between two human beings in an image using a VGG-FACE model previously trained for face recognition and fine-tuned on a social relation database as branches of a siamese-like network. Expand
Group emotion recognition with individual facial emotion CNNs and global image based CNNs
TLDR
This paper presents the approach for group-level emotion recognition in the Emotion Recognition in the Wild Challenge 2017, based on two types of Convolutional Neural Networks, namely individual facial emotion CNNs and global image based CNNs. Expand
Object-Scene Convolutional Neural Networks for event recognition in images
TLDR
This paper designs a new architecture, called Object-Scene Convolutional Neural Network (OS-CNN), which is decomposed into object net and scene net, which extract useful information for event understanding from the perspective of objects and scene context, respectively, and investigates different network architectures for OS-CNN design. Expand
Happiness level prediction with sequential inputs via multiple regressions
TLDR
This paper presents the solution submitted to the Emotion Recognition in the Wild (EmotiW 2016) group-level happiness intensity prediction sub-challenge, which uses a convolutional neural network to extract discriminative face features, and a recurrent neuralnetwork to selectively memorize the important features to perform the group- level happiness prediction task. Expand
...
1
2
3
4
5
...