Automatic Analysis of the Emotional Content of Speech in Daylong Child-Centered Recordings from a Neonatal Intensive Care Unit

@article{Vaaras2021AutomaticAO,
  title={Automatic Analysis of the Emotional Content of Speech in Daylong Child-Centered Recordings from a Neonatal Intensive Care Unit},
  author={Einari Vaaras and Sari Ahlqvist-Bj{\"o}rkroth and Konstantinos Drossos and Okko Johannes R{\"a}s{\"a}nen},
  journal={ArXiv},
  year={2021},
  volume={abs/2106.09539}
}
Researchers have recently started to study how the emotional speech heard by young infants can affect their devel-opmental outcomes. As a part of this research, hundreds of hours of daylong recordings from preterm infants’ audio en-vironments were collected from two hospitals in Finland and Estonia in the context of so-called APPLE study. In order to analyze the emotional content of speech in such a massive dataset, an automatic speech emotion recognition (SER) system is re-quired. However… 

Figures and Tables from this paper

State-of-the-art violence detection techniques in video surveillance security systems: a systematic review

This systematic review provides a comprehensive assessment of the video violence detection problems that have been described in state-of-the-art researches and presents public datasets for testing video based violence detection methods’ performance and compares their results.

Analysis of Self-Supervised Learning and Dimensionality Reduction Methods in Clustering-Based Active Learning for Speech Emotion Recognition

CPC and multiple dimensionality reduction methods are combined in search of functioning practices for clustering-based AL and it is observed that compressing data dimensionality does not harm AL performance substantially, and that 2-D feature representations achieved similar AL performance as higher-dimensional representations when the number of annotations is not very low.

References

SHOWING 1-10 OF 35 REFERENCES

LSSED: A Large-Scale Dataset and Benchmark for Speech Emotion Recognition

A challenging large-scale english speech emotion dataset, which has data collected from 820 subjects to simulate real- world distribution, and some pre-trained models based on LSSED, which can not only promote the development of speech emotion recognition, but can also be transferred to related downstream tasks such as mental health analysis where data is extremely difficult to collect.

Active Learning for Speech Emotion Recognition Using Deep Neural Network

  • Mohammed AbdelwahabC. Busso
  • Computer Science
    2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII)
  • 2019
This study explores practical solutions to train DNNs for speech emotion recognition with limited resources by using active learning (AL), assuming that data without emotional labels from a new domain are available and one has resources to select a limited number of recordings to be annotated with emotional labels.

A database of German emotional speech

A database of emotional speech that was evaluated in a perception test regarding the recognisability of emotions and their naturalness and can be accessed by the public via the internet.

The Automatic Recognition of Emotions in Speech

The subject area of this chapter is not emotions in some narrow sense but in a wider sense encompassing emotion-related states such as moods, attitudes, or interpersonal stances as well.

A thorough evaluation of the Language Environment Analysis (LENA) system.

Whether LENAⓇ results are accurate enough for a given research, educational, or clinical application depends largely on the specifics at hand, and a set of recommendations is concluded to help researchers make this determination for their goals.

Universum Autoencoder-Based Domain Adaptation for Speech Emotion Recognition

This letter proposes a novel unsupervised domain adaptation model, called Universum autoencoders, to improve the performance of the systems evaluated in mismatched training and test conditions and demonstrates the effectiveness of the proposed method when compared to other domain adaptation methods.

Unsupervised learning in cross-corpus acoustic emotion recognition

It is shown that adding unlabeled emotional speech to agglomerated multi-corpus training sets can enhance recognition performance even in a challenging cross- Corpus setting, and that the expected gain by adding unl Isabeled data on average is approximately half the one achieved by additional manually labeled data in leave-one-corpsus-out validation.

The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing

A basic standard acoustic parameter set for various areas of automatic voice analysis, such as paralinguistic or clinical speech analysis, is proposed and intended to provide a common baseline for evaluation of future research and eliminate differences caused by varying parameter sets or even different implementations of the same parameters.

Active Learning for Speech Emotion Recognition Using Conditional Random Fields

  • Ziping ZhaoXirong Ma
  • Computer Science
    2013 14th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing
  • 2013
Experiments show that for most of the cases considered, active selection strategies when recognizing speech emotion are as good as or exceed the performance of random data selection.

Unsupervised Adversarial Domain Adaptation for Cross-Lingual Speech Emotion Recognition

The proposed GAN-based model for multilingual SER is designed in such a way that the language invariant representations can be learned without requiring target-language data labels and can significantly improve the baseline cross-lingual SER performance.