Corpus ID: 235359248

Non-Volume Preserving-based Fusion to Group-Level Emotion Recognition on Crowd Videos

  title={Non-Volume Preserving-based Fusion to Group-Level Emotion Recognition on Crowd Videos},
  author={Kha Gia Quach and Ngan T. H. Le and Chi Nhan Duong and Ibsa Jalata and Kaushik Roy and Khoa Luu},
Group-level emotion recognition (ER) is a growing research area as the demands for assessing crowds of all sizes are becoming an interest in both the security arena as well as social media. This work extends the earlier ER investigations, which focused on either group-level ER on single images or within a video, by fully investigating grouplevel expression recognition on crowd videos. In this paper, we propose an effective deep feature level fusion mechanism to model the spatial-temporal… Expand

Figures and Tables from this paper


Detecting Coherent Groups in Crowd Scenes by Multiview Clustering
A new structural context descriptor is designed to characterize the structural properties of individuals in crowd scenes and a novel framework is introduced for group detection, which is able to determine the group number automatically without any parameter or threshold to be tuned. Expand
RetinaFace: Single-Shot Multi-Level Face Localisation in the Wild
A novel single-shot, multi-level face localisation method, named RetinaFace, which unifies face box prediction, 2D facial landmark localisation and 3D vertices regression under one common target: point regression on the image plane. Expand
AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild
AffectNet is by far the largest database of facial expression, valence, and arousal in the wild enabling research in automated facial expression recognition in two different emotion models and various evaluation metrics show that the deep neural network baselines can perform better than conventional machine learning methods and off-the-shelf facial expressions recognition systems. Expand
Alone vs In-a-group: A Multi-modal Framework for Automatic Affect Recognition
Recognition and analysis of human a‚ect has been researched extensively within the €eld of computer science in the last two decades. However, most of the past research in automatic analysis of humanExpand
Automatic Group Cohesiveness Detection With Multi-modal Features
An automatic group cohesiveness prediction method for the 7th Emotion Recognition in the Wild (EmotiW 2019) Grand Challenge in the category of Group-based Cohesion Prediction including regression models which are separately trained on face features, skeleton features, and scene features is introduced. Expand
Bi-modality Fusion for Emotion Recognition in the Wild
A bi-modality fusion method for video based emotion recognition in the wild that takes advantages of the visual information from facial expression sequences and the speech information from audio. Expand
Bootstrap Model Ensemble and Rank Loss for Engagement Intensity Regression
This paper presents the approach for the engagement intensity regression task of EmotiW 2019, and uses the classical bootstrap aggregation method to perform model ensemble which randomly samples a certain training data by several times and then averages the model predictions. Expand
EmotiW 2019: Automatic Emotion, Engagement and Cohesion Prediction Tasks
The EmotiW benchmarking platform provides researchers with an opportunity to evaluate their methods on affect labelled data and the databases used, the experimental protocols and the baselines are discussed. Expand
Exploring Regularizations with Face, Body and Image Cues for Group Cohesion Prediction
This paper elaborately design two regularizations, namely a rank loss and a hourglass loss, where the former aims to give a margin between the distance of distant categories and near categories and the later to avoid centralization predictions with only MSE loss. Expand
Group-level Cohesion Prediction using Deep Learning Models with A Multi-stream Hybrid Network
A hybrid deep learning network for predicting group cohesion in images that exploits four types of visual cues, such as scene, skeleton, UV coordinates and face image, along with state-of-the-art convolutional neural networks (CNNs). Expand