Supervised domain adaptation for emotion recognition from speech

@article{AbdelWahab2015SupervisedDA,
  title={Supervised domain adaptation for emotion recognition from speech},
  author={Mohammed Abdel-Wahab and Carlos Busso},
  journal={2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2015},
  pages={5058-5062}
}
  • Mohammed Abdel-Wahab, C. Busso
  • Published 19 April 2015
  • Computer Science
  • 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
One of the main barriers in the deployment of speech emotion recognition systems in real applications is the lack of generalization of the emotion classifiers. The recognition performance achieved in controlled recordings drops when the models are tested with different speakers, channels, environments and domain conditions. This paper explores supervised model adaptation, which can improve the performance of systems evaluated with mismatched training and testing conditions. We address the… 

Figures from this paper

Universum Autoencoder-Based Domain Adaptation for Speech Emotion Recognition

TLDR
This letter proposes a novel unsupervised domain adaptation model, called Universum autoencoders, to improve the performance of the systems evaluated in mismatched training and test conditions and demonstrates the effectiveness of the proposed method when compared to other domain adaptation methods.

Incremental adaptation using active learning for acoustic emotion recognition

TLDR
This paper demonstrates that it can increase the performance of the speech recognition system by incrementally adapting the models using carefully selected samples available after active learning, and proposes a novel iterative fast converging incremental adaptation algorithm that only uses correctly classified samples at each iteration.

Unsupervised Personalization of an Emotion Recognition System: The Unique Properties of the Externalization of Valence in Speech

  • K. SridharC. Busso
  • Computer Science, Psychology
    IEEE Transactions on Affective Computing
  • 2022
TLDR
An unsupervised approach to address the prediction of valence from speech by searching for speakers in the train set with similar acoustic patterns as the speaker in the test set, leading to relative improvements as high as 13.52%.

Recognition System : The Unique Properties 3 of the Externalization of Valence in Speech

TLDR
An unsupervised approach to address the prediction of valence from speech by searching for speakers in the train set with similar acoustic patterns as the speaker in the test set is proposed.

Domain Adversarial for Acoustic Emotion Recognition

TLDR
It is shown that exploiting unlabeled data consistently leads to better emotion recognition performance across all emotional dimensions, and the effect of adversarial training on the feature representation across the proposed deep learning architecture is visualize.

Cross-corpus speech emotion recognition using transfer semi-supervised discriminant analysis

TLDR
A novel transfer learning approach, called transfer semi-supervised linear discriminant analysis (TSDA), is presented for cross-corpus speech emotion recognition, which jointly optimizes the SDA and distribution similarity measurement together.

Active Learning for Speech Emotion Recognition Using Deep Neural Network

  • Mohammed AbdelwahabC. Busso
  • Computer Science
    2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII)
  • 2019
TLDR
This study explores practical solutions to train DNNs for speech emotion recognition with limited resources by using active learning (AL), assuming that data without emotional labels from a new domain are available and one has resources to select a limited number of recordings to be annotated with emotional labels.

Application of Emotion Recognition and Modification for Emotional Telugu Speech Recognition

TLDR
The importance of emotion recognition block at the front-end along with the emotive speech adaptation to the ASR system models were studied and the adapted emotives have yielded better performance over the existing neutral speech models.

Automatic voice emotion recognition of child-parent conversations in natural settings

TLDR
This work used the minimalistic/extended acoustic feature set extracted with OpenSMILE and a small/large set of annotated utterances for building models, and analyzed the prevalence of the class neutral, indicating that the bigger the combined sets, the better the training outcomes.

Domain adaptation for speech emotion recognition by sharing priors between related source and target classes

TLDR
A domain adaptation method called Sharing Priors between Related Source and Target classes (SPRST) based on a two-layer neural network is proposed, which significantly improves the performance when only a small number of target labeled instances are available.
...

References

SHOWING 1-10 OF 29 REFERENCES

Iterative Feature Normalization Scheme for Automatic Emotion Detection from Speech

TLDR
The iterative feature normalization (IFN) framework is presented, which is an unsupervised front-end, especially designed for emotion detection, which aims to reduce the acoustic differences, between the neutral speech across speakers, while preserving the inter-emotional variability in expressive speech.

Speech Emotion Recognition using an Enhanced Co-Training Algorithm

TLDR
An enhanced co- training algorithm to utilize a large amount of unlabeled speech utterances for building a semi-supervised learning system that achieves comparable performance to the co-training prototype, while it can reduce the classification noise which is produced by error labeling in the process of semi- supervised learning.

A personalized emotion recognition system using an unsupervised feature adaptation scheme

  • Tauhidur RahmanC. Busso
  • Computer Science
    2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2012
TLDR
An unsupervised feature adaptation scheme that aims to reduce the mismatch between the acoustic features used to train the system and theoustic features extracted from the unknown targeted speaker using the recently proposed iterative feature normalization (IFN) framework.

Sparse Autoencoder-Based Feature Transfer Learning for Speech Emotion Recognition

TLDR
A sparse auto encoder method for feature transfer learning for speech emotion recognition using a common emotion-specific mapping rule from a small set of labelled data in a target domain to improve the performance relative to learning each source domain independently.

Speaker Normalisation for Speech-Based Emotion Detection

TLDR
This paper compares the performance of a system that uses feature warping to one that does not, and proposes the use of speaker-specific feature warped as a means of normalising acoustic features to overcome the problem of speaker dependency.

Towards More Reality in the Recognition of Emotional Speech

TLDR
The major aspects of emotion recognition are addressed in view of potential applications in the field, to benchmark today's emotion recognition systems and bridge the gap between commercial interest and current performances: acted vs. spontaneous speech, realistic emotions, noise and microphone conditions, and speaker independence.

Unsupervised learning in cross-corpus acoustic emotion recognition

TLDR
It is shown that adding unlabeled emotional speech to agglomerated multi-corpus training sets can enhance recognition performance even in a challenging cross- Corpus setting, and that the expected gain by adding unl Isabeled data on average is approximately half the one achieved by additional manually labeled data in leave-one-corpsus-out validation.

Iterative feature normalization for emotional speech detection

TLDR
This paper introduces a feature normalization scheme that implements these ideas by iteratively detecting neutral speech and normalizing the features, and as the approximation error of the normalization parameters is reduced, the accuracy of the emotion detection system increases.

Automatic Classification of Expressiveness in Speech: A Multi-corpus Study

TLDR
The results show that AIBO and SBA are competitive on the four databases considered, although the AIBO approach works better with long utterances whereas the SBA seems to be better suited for classification of short utterances.

Building a naturalistic emotional speech corpus by retrieving expressive behaviors from existing speech corpora

TLDR
This study uses the IEMOCAP and SEMAINE databases to build emotion detector systems and uses them to identify emotional behaviors from the FISHER database, which is a large conversational speech corpus recorded over the phone.