A Cross-Domain Approach for Continuous Impression Recognition from Dyadic Audio-Visual-Physio Signals
@article{Li2022ACA, title={A Cross-Domain Approach for Continuous Impression Recognition from Dyadic Audio-Visual-Physio Signals}, author={Yuanchao Li and Catherine Lai}, journal={ArXiv}, year={2022}, volume={abs/2203.13932} }
The impression we make on others depends not only on what we say, but also, to a large extent, on how we say it. As a sub-branch of affective computing and social signal processing, impression recognition has proven critical in both human-human conversations and spoken dialogue systems. However, most research has studied impressions only from the signals ex-pressed by the emitter, ignoring the response from the receiver. In this paper, we perform impression recognition using a proposed cross…
One Citation
Multimodal Dyadic Impression Recognition via Listener Adaptive Cross-Domain Fusion
- Computer ScienceArXiv
- 2022
This paper performs impression recognition using a proposed listener adaptive cross-domain architecture, which consists of a listener adaptation function to model the causality between speaker and listener behaviors and a cross- domain fusion function to strengthen their connection.
References
SHOWING 1-10 OF 35 REFERENCES
Speech Emotion Recognition from Variable-Length Inputs with Triplet Loss Function
- Computer ScienceINTERSPEECH
- 2018
This work proposes a triplet framework based on Long Short-Term Memory Neural Network (LSTM) for speech emotion recognition that learns a mapping from acoustic features to discriminative embedding features, which is regarded as basis of testing with SVM.
Improved End-to-End Speech Emotion Recognition Using Self Attention Mechanism and Multitask Learning
- Computer ScienceINTERSPEECH
- 2019
A speech emotion recognition (SER) method using end-to-end (E2E) multitask learning with self attention to deal with several issues is proposed, which outperforms the state-of-the-art methods and improves the overall accuracy.
Fusing ASR Outputs in Joint Training for Speech Emotion Recognition
- Computer ScienceICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2022
Experiments show that in joint ASR-SER training, incorporating both ASR hidden and text output using a hierarchical co-attention fusion approach improves the SER performance the most.
Efficient Emotion Recognition from Speech Using Deep Learning on Spectrograms
- Computer ScienceINTERSPEECH
- 2017
A new implementation of emotion recognition from the para-lingual information in the speech, based on a deep neural network, applied directly to spectrograms, achieves higher recognition accuracy compared to previously published results, while also limiting the latency.
Audio-Oriented Multimodal Machine Comprehension via Dynamic Inter- and Intra-modality Attention
- Computer ScienceAAAI
- 2021
A Dynamic Inter- and Intra-modality Attention (DIIA) model is proposed to effectively fuse the two modalities (audio and textual) in Audio-Oriented Multimodal Machine Comprehension, making fair comparisons possible between the model and the existing unimodal MC models.
Recognizing emotions in spoken dialogue with hierarchically fused acoustic and lexical features
- Computer Science2016 IEEE Spoken Language Technology Workshop (SLT)
- 2016
The Hierarchical fusion strategy for multimodal emotion recognition is proposed, which incorporates global or more abstract features at higher levels of its knowledge-inspired structure and consistently outperforms both Feature-Level and Decision-Level fusion.
Analyzing first impressions of warmth and competence from observable nonverbal cues in expert-novice interactions
- PsychologyICMI
- 2017
The analysis of a corpus of dyadic expert-novice knowledge sharing interactions aims at investigating the relationship between observed non-verbal cues and first impressions formation of warmth and competence, and provides interesting insights about the role of rest poses.
A review of affective computing: From unimodal analysis to multimodal fusion
- Computer ScienceInf. Fusion
- 2017
Multitask Learning and Multistage Fusion for Dimensional Audiovisual Emotion Recognition
- Computer ScienceICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020
This paper proposes two methods to predict emotional attributes from audio and visual data using a multitask learning and a fusion strategy and a multistage fusion is proposed to combine results from various modalities’ final prediction.
The YouTube Lens: Crowdsourced Personality Impressions and Audiovisual Analysis of Vlogs
- PsychologyIEEE Transactions on Multimedia
- 2013
This work investigates the feasibility of crowdsourcing personality impressions from vlogging as a way to obtain judgements from a variate audience that consumes social media video, and addresses the task of automatic prediction of vloggers' personality impressions using nonverbal cues and machine learning techniques.