A Face-to-Face Neural Conversation Model
@article{Chu2018AFN, title={A Face-to-Face Neural Conversation Model}, author={Hang Chu and Daiqing Li and Sanja Fidler}, journal={2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition}, year={2018}, pages={7113-7121} }
Neural networks have recently become good at engaging in dialog. [] Key Method In particular, we introduce an RNN encoder-decoder that exploits the movement of facial muscles, as well as the verbal conversation. The decoder consists of two layers, where the lower layer aims at generating the verbal response and coarse facial expressions, while the second layer fills in the subtle gestures, making the generated output more smooth and natural. We train our neural network by having it "watch" 250 movies. We…
Figures and Tables from this paper
15 Citations
A Realistic Face-to-Face Conversation System Based on Deep Neural Networks
- Computer Science2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)
- 2019
Experimental results show that the conversation system can generate natural facial reactions and realistic facial images and train and evaluate the neural networks with the data from ESPN shows.
Emotional conversation generation with heterogeneous graph neural network
- Computer ScienceArtif. Intell.
- 2022
To React or not to React: End-to-End Visual Pose Forecasting for Personalized Avatar during Dyadic Conversations
- PsychologyICMI
- 2019
A neural architecture named Dyadic Residual-Attention Model (DRAM), which integrates intrapersonal (monadic) and interpersonal (dyadic) dynamics using selective attention to generate sequences of body pose conditioned on audio and body pose of the interlocutor and audio of the human operating the avatar.
Towards More Realistic Human-Robot Conversation: A Seq2Seq-based Body Gesture Interaction System
- Computer Science2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- 2019
A novel system that enables intelligent robots to exhibit realistic body gestures while communicating with humans that is implemented to drive a virtual avatar as well as Pepper, a physical humanoid robot, to demonstrate the improvement on conversational interaction abilities of the method in practice.
Sparse Feature Representation Learning for Deep Face Gender Transfer
- Computer Science2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)
- 2021
The findings seem to corroborate a hypothesis about the independence between face recognizability and gender classifiability in the literature of psychology and stimulate more computational studies of different face perception attributes including race, age, attractiveness, and trustworthiness.
Didn't see that coming: a survey on non-verbal social human behavior forecasting
- Computer ScienceArXiv
- 2022
This survey defines the behavior forecasting problem for multiple interactive agents in a generic way that aims at unifying the fields of social signals prediction and human motion forecasting, traditionally separated.
A new multi-feature fusion based convolutional neural network for facial expression recognition
- Computer ScienceAppl. Intell.
- 2022
A lightweight network called Multi-feature Fusion Based Convolutional Neural Network (MFF-CNN), for image-based FER that improves the average recognition accuracy by 9.80% to 15.05% and joint tuning is employed to integrate the two branches and fuse features.
Animating an Autonomous 3 D Talking Avatar
- Computer Science
- 2019
This paper addresses the problem of annotating how and when motions can be played and composed together in real-time with a compact taxonomy of chit chat behaviors, and measures the time required to label actions of an embodiment using a simple interface, compared to the standard state machine interface in Unreal Engine, and finds that the approach is 7 times faster.
Animating an Autonomous 3D Talking Avatar
- Computer ScienceInternational Conferences Interfaces and Human Computer Interaction 2019; Game and Entertainment Technologies 2019; and Computer Graphics, Visualization, Computer Vision and Image Processing 2019
- 2019
This paper addresses the problem of annotating how and when motions can be played and composed together in real-time with a compact taxonomy of chit chat behaviors, and measures the time required to label actions of an embodiment using a simple interface, compared to the standard state machine interface in Unreal Engine, and finds that the approach is 7 times faster.
References
SHOWING 1-10 OF 35 REFERENCES
A Neural Conversational Model
- Computer ScienceArXiv
- 2015
A simple approach to conversational modeling which uses the recently proposed sequence to sequence framework, and is able to extract knowledge from both a domain specific dataset, and from a large, noisy, and general domain dataset of movie subtitles.
Real-time prosody-driven synthesis of body language
- Computer ScienceSIGGRAPH 2009
- 2009
This work presents a method for automatically synthesizing body language animations directly from the participants' speech signals, without the need for additional input, suitable for animating characters from live human speech.
Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models
- Computer ScienceAAAI
- 2016
The recently proposed hierarchical recurrent encoder-decoder neural network is extended to the dialogue domain, and it is demonstrated that this model is competitive with state-of-the-art neural language models and back-off n-gram models.
Deep Reinforcement Learning for Dialogue Generation
- Computer ScienceEMNLP
- 2016
This work simulates dialogues between two virtual agents, using policy gradient methods to reward sequences that display three useful conversational properties: informativity, non-repetitive turns, coherence, and ease of answering.
Gesture controllers
- Computer ScienceSIGGRAPH 2010
- 2010
The modularity of the proposed method allows customization of a character's gesture repertoire, animation of non-human characters, and the use of additional inputs such as speech recognition or direct user control.
A Diversity-Promoting Objective Function for Neural Conversation Models
- Computer ScienceHLT-NAACL
- 2016
This work proposes using Maximum Mutual Information (MMI) as the objective function in neural models, and demonstrates that the proposed MMI models produce more diverse, interesting, and appropriate responses, yielding substantive gains in BLEU scores on two conversational datasets and in human evaluations.
A Persona-Based Neural Conversation Model
- Psychology, Computer ScienceACL
- 2016
This work presents persona-based models for handling the issue of speaker consistency in neural response generation that yield qualitative performance improvements in both perplexity and BLEU scores over baseline sequence-to-sequence models.
Sequence to Sequence Learning with Neural Networks
- Computer ScienceNIPS
- 2014
This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues
- Computer ScienceAAAI
- 2017
A neural network-based generative architecture, with latent stochastic variables that span a variable number of time steps, that improves upon recently proposed models and that the latent variables facilitate the generation of long outputs and maintain the context.
OpenFace: An open source facial behavior analysis toolkit
- Computer Science2016 IEEE Winter Conference on Applications of Computer Vision (WACV)
- 2016
OpenFace is the first open source tool capable of facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation and allows for easy integration with other applications and devices through a lightweight messaging system.