A Face-to-Face Neural Conversation Model

@article{Chu2018AFN,
  title={A Face-to-Face Neural Conversation Model},
  author={Hang Chu and Daiqing Li and Sanja Fidler},
  journal={2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2018},
  pages={7113-7121}
}
  • Hang Chu, Daiqing Li, S. Fidler
  • Published 1 June 2018
  • Computer Science
  • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
Neural networks have recently become good at engaging in dialog. [] Key Method In particular, we introduce an RNN encoder-decoder that exploits the movement of facial muscles, as well as the verbal conversation. The decoder consists of two layers, where the lower layer aims at generating the verbal response and coarse facial expressions, while the second layer fills in the subtle gestures, making the generated output more smooth and natural. We train our neural network by having it "watch" 250 movies. We…

Figures and Tables from this paper

A Realistic Face-to-Face Conversation System Based on Deep Neural Networks
TLDR
Experimental results show that the conversation system can generate natural facial reactions and realistic facial images and train and evaluate the neural networks with the data from ESPN shows.
To React or not to React: End-to-End Visual Pose Forecasting for Personalized Avatar during Dyadic Conversations
TLDR
A neural architecture named Dyadic Residual-Attention Model (DRAM), which integrates intrapersonal (monadic) and interpersonal (dyadic) dynamics using selective attention to generate sequences of body pose conditioned on audio and body pose of the interlocutor and audio of the human operating the avatar.
Towards More Realistic Human-Robot Conversation: A Seq2Seq-based Body Gesture Interaction System
TLDR
A novel system that enables intelligent robots to exhibit realistic body gestures while communicating with humans that is implemented to drive a virtual avatar as well as Pepper, a physical humanoid robot, to demonstrate the improvement on conversational interaction abilities of the method in practice.
A survey on empathetic dialogue systems
Sparse Feature Representation Learning for Deep Face Gender Transfer
TLDR
The findings seem to corroborate a hypothesis about the independence between face recognizability and gender classifiability in the literature of psychology and stimulate more computational studies of different face perception attributes including race, age, attractiveness, and trustworthiness.
Didn't see that coming: a survey on non-verbal social human behavior forecasting
TLDR
This survey defines the behavior forecasting problem for multiple interactive agents in a generic way that aims at unifying the fields of social signals prediction and human motion forecasting, traditionally separated.
A new multi-feature fusion based convolutional neural network for facial expression recognition
TLDR
A lightweight network called Multi-feature Fusion Based Convolutional Neural Network (MFF-CNN), for image-based FER that improves the average recognition accuracy by 9.80% to 15.05% and joint tuning is employed to integrate the two branches and fuse features.
Animating an Autonomous 3 D Talking Avatar
TLDR
This paper addresses the problem of annotating how and when motions can be played and composed together in real-time with a compact taxonomy of chit chat behaviors, and measures the time required to label actions of an embodiment using a simple interface, compared to the standard state machine interface in Unreal Engine, and finds that the approach is 7 times faster.
Animating an Autonomous 3D Talking Avatar
  • Dominik Borer, Dominik Lutz, M. Guay
  • Computer Science
    International Conferences Interfaces and Human Computer Interaction 2019; Game and Entertainment Technologies 2019; and Computer Graphics, Visualization, Computer Vision and Image Processing 2019
  • 2019
TLDR
This paper addresses the problem of annotating how and when motions can be played and composed together in real-time with a compact taxonomy of chit chat behaviors, and measures the time required to label actions of an embodiment using a simple interface, compared to the standard state machine interface in Unreal Engine, and finds that the approach is 7 times faster.
...
1
2
...

References

SHOWING 1-10 OF 35 REFERENCES
A Neural Conversational Model
TLDR
A simple approach to conversational modeling which uses the recently proposed sequence to sequence framework, and is able to extract knowledge from both a domain specific dataset, and from a large, noisy, and general domain dataset of movie subtitles.
Real-time prosody-driven synthesis of body language
TLDR
This work presents a method for automatically synthesizing body language animations directly from the participants' speech signals, without the need for additional input, suitable for animating characters from live human speech.
Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models
TLDR
The recently proposed hierarchical recurrent encoder-decoder neural network is extended to the dialogue domain, and it is demonstrated that this model is competitive with state-of-the-art neural language models and back-off n-gram models.
Deep Reinforcement Learning for Dialogue Generation
TLDR
This work simulates dialogues between two virtual agents, using policy gradient methods to reward sequences that display three useful conversational properties: informativity, non-repetitive turns, coherence, and ease of answering.
Gesture controllers
TLDR
The modularity of the proposed method allows customization of a character's gesture repertoire, animation of non-human characters, and the use of additional inputs such as speech recognition or direct user control.
A Diversity-Promoting Objective Function for Neural Conversation Models
TLDR
This work proposes using Maximum Mutual Information (MMI) as the objective function in neural models, and demonstrates that the proposed MMI models produce more diverse, interesting, and appropriate responses, yielding substantive gains in BLEU scores on two conversational datasets and in human evaluations.
A Persona-Based Neural Conversation Model
TLDR
This work presents persona-based models for handling the issue of speaker consistency in neural response generation that yield qualitative performance improvements in both perplexity and BLEU scores over baseline sequence-to-sequence models.
Sequence to Sequence Learning with Neural Networks
TLDR
This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues
TLDR
A neural network-based generative architecture, with latent stochastic variables that span a variable number of time steps, that improves upon recently proposed models and that the latent variables facilitate the generation of long outputs and maintain the context.
OpenFace: An open source facial behavior analysis toolkit
TLDR
OpenFace is the first open source tool capable of facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation and allows for easy integration with other applications and devices through a lightweight messaging system.
...
1
2
3
4
...