Training deep networks for facial expression recognition with crowd-sourced label distribution

  title={Training deep networks for facial expression recognition with crowd-sourced label distribution},
  author={Emad Barsoum and Cha Zhang and Cristian Canton-Ferrer and Zhengyou Zhang},
  journal={Proceedings of the 18th ACM International Conference on Multimodal Interaction},
Crowd sourcing has become a widely adopted scheme to collect ground truth labels. [] Key Method More specifically, we have 10 taggers to label each input image, and compare four different approaches to utilizing the multiple labels: majority voting, multi-label learning, probabilistic label drawing, and cross-entropy loss. We show that the traditional majority voting scheme does not perform as well as the last two approaches that fully leverage the label distribution. An enhanced FER+ data set with multiple…

Figures and Tables from this paper

Regularizing the loss layer of CNNs for facial expression recognition using crowdsourced labels

A novel approach of using crowdsourced label distributions for improving the generalization performance of convolutional neural networks for FER by using a label disturbance method in which training examples are randomly replaced with incorrect labels drawn from the combined label probability distribution.

Mitigating Label-Noise for Facial Expression Recognition in the Wild

A simple but effective Label-noise Robust Network (LRN) which explores the inter-class correlations for mitigating ambiguity that usually happens between morphologically similar classes to suppress the heteroscedastic uncertainty caused by inter- class label noise.

Boosting Facial Expression Recognition by A Semi-Supervised Progressive Teacher

This paper proposes a semi-supervised learning algorithm named Progressive Teacher (PT) to utilize reliable FER datasets as well as large-scale unlabeled expression images for effective training, which achieves state-of-the-art performance on widely-used databases RAF-DB and FERPlus.

Learn From All: Erasing Attention Consistency for Noisy Label Facial Expression Recognition

This paper proposes a novel Erasing Attention Consistency (EAC) method to suppress the noisy samples during the training process automatically and significantly outperforms state-of-the-art noisy label FER methods and generalizes well to other tasks with a large number of classes like CIFAR100 and Tiny-ImageNet.

Learning Discriminative Representation For Facial Expression Recognition From Uncertainties

Novel Rayleigh and weighted-softmax loss from two aspects are introduced to extract discriminative representation and a weight is introduced to measure the uncertainty of a given sample, by considering its distance to class center.

Handling ambiguous annotations for facial expression recognition in the wild

This work proposes a simple and effective single network FER framework robust to noisy annotations and qualifies an image to be clean if the Jenson-Shannon divergence between its ground truth distribution and the predicted distribution for its weak augmented version is smaller than a threshold.

Annealed Label Transfer for Face Expression Recognition

A method for recognizing facial expressions using information from a pair of domains: one has labelled data and one with unlabelled data, which depart from the traditional semi–supervised framework towards a transfer learning approach.

Dynamic Adaptive Threshold based Learning for Noisy Annotations Robust Facial Expression Recognition

A dynamic FER learning framework (DNFER) in which clean samples are selected based on dynamic class specific threshold during training, which is independent of noise rate and does not need any clean data unlike other methods.

Adaptive Deep Metric Learning for Identity-Aware Facial Expression Recognition

Better FER performance can be achieved by combining the deep metric loss and softmax loss in a unified two fully connected layer branches framework via joint optimization, which reduces the computational burden of deep metric learning, and alleviates the difficulty of threshold validation and anchor selection.



Image based Static Facial Expression Recognition with Multiple Deep Network Learning

This work reports the proposed image based static facial expression recognition method for the Emotion Recognition in the Wild Challenge (EmotiW) 2015, and presents two schemes for learning the ensemble weights of the network responses by minimizing the log likelihood loss and the hinge loss.

Facial Expression Recognition via a Boosted Deep Belief Network

A novel Boosted Deep Belief Network for performing the three training stages iteratively in a unified loopy framework and showed that the BDBN framework yielded dramatic improvements in facial expression analysis.

Deeply Learning Deformable Facial Action Parts Model for Dynamic Expression Analysis

This paper proposes to adapt 3D Convolutional Neural Networks (3D CNN) with deformable action parts constraints, which can detect specific facial action parts under the structured spatial constraints, and obtain the discriminative part-based representation simultaneously.

Combining modality specific deep neural networks for emotion recognition in video

In this paper we present the techniques used for the University of Montréal's team submissions to the 2013 Emotion Recognition in the Wild Challenge. The challenge is to classify the emotions

DisturbLabel: Regularizing CNN on the Loss Layer

An extremely simple algorithm which randomly replaces a part of labels as incorrect values in each iteration, which prevents the network training from over-fitting by implicitly averaging over exponentially many networks which are trained with different label sets.

Emotion Distribution Recognition from Facial Expressions

Experimental results show that EDL can effectively deal with the emotion distribution recognition problem and perform remarkably better than the state-of-the-art multi-label learning methods.

Very Deep Convolutional Networks for Large-Scale Image Recognition

This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

Deep Residual Learning for Image Recognition

This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

CREMA-D: Crowd-Sourced Emotional Multimodal Actors Dataset

An audio-visual dataset uniquely suited for the study of multi-modal emotion expression and perception, which consists of facial and vocal emotional expressions in sentences spoken in a range of basic emotional states, can be used to probe other questions concerning the audio- visual perception of emotion.

Deep Learning using Linear Support Vector Machines

The results using L2-SVMs show that by simply replacing softmax with linear SVMs gives significant gains on popular deep learning datasets MNIST, CIFAR-10, and the ICML 2013 Representation Learning Workshop's face expression recognition challenge.