The ACM Multimedia 2022 Computational Paralinguistics Challenge: Vocalisations, Stuttering, Activity, & Mosquitoes

@article{Schuller2022TheAM,
  title={The ACM Multimedia 2022 Computational Paralinguistics Challenge: Vocalisations, Stuttering, Activity, \& Mosquitoes},
  author={Bj{\"o}rn Schuller and Anton Batliner and Shahin Amiriparian and Christian Bergler and Maurice Gerczuk and Natalie Holz and Pauline Larrouy-Maestri and S.P. Bayerl and Korbinian Riedhammer and Adria Mallol-Ragolta and Maria Pateraki and Harry Coppock and Ivan Kiskin and Marianne E. Sinka and Stephen J. Roberts},
  journal={ArXiv},
  year={2022},
  volume={abs/2205.06799}
}
The ACM Multimedia 2022 Computational Paralinguistics Challenge addresses four different problems for the first time in a research competition under well-defined conditions: In the Vocalisations and Stuttering Sub-Challenges, a classification on human non-verbal vocalisations and speech has to be made; the Activity Sub-Challenge aims at beyond-audio human activity recognition from smartwatch sensor data; and in the Mosquitoes Sub-Challenge, mosquitoes need to be detected. We describe the Sub… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 37 REFERENCES
The INTERSPEECH 2020 Computational Paralinguistics Challenge: Elderly Emotion, Breathing & Masks
TLDR
The Sub-Challenges, baseline feature extraction, and classifiers based on the ‘usual’ COMPARE and BoAW features as well as deep unsupervised representation learning using the AUDEEP toolkit, and deep feature extraction from pre-trained CNNs using the DEEP SPECTRUM toolkit are described.
The INTERSPEECH 2021 Computational Paralinguistics Challenge: COVID-19 Cough, COVID-19 Speech, Escalation & Primates
TLDR
The Sub-Challenges, baseline feature extraction, and classifiers based on the 'usual' COMPARE and BoAW features as well as deep unsupervised representation learning using the AuDeep toolkit, and deep feature extraction from pre-trained CNNs using the Deep Spectrum toolkit are described.
Sound Event Detection in the DCASE 2017 Challenge
TLDR
Analysis of the systems behavior reveals that task-specific optimization has a big role in producing good performance; however, often this optimization closely follows the ranking metric, and its maximization/minimization does not result in universally good performance.
Sequence to Sequence Autoencoders for Unsupervised Representation Learning from Audio
TLDR
This paper proposes a system for this task using a recurrent sequence to sequence autoencoder for unsupervised representation learning from raw audio files, and trains a multilayer perceptron neural network on these feature vectors to predict the class labels.
At the Border of Acoustics and Linguistics: Bag-of-Audio-Words for the Recognition of Emotions in Speech
TLDR
This work proposes a method using BoAW created only of mel-frequency cepstral coefficients (MFCCs) that outperforms by far most of recently published deep learning approaches, including convolutional and recurrent networks in emotion recognition.
The INTERSPEECH 2009 emotion challenge
TLDR
The challenge, the corpus, the features, and benchmark results of two popular approaches towards emotion recognition from speech, and the FAU Aibo Emotion Corpus are introduced.
The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism
The INTERSPEECH 2013 Computational Paralinguistics Challenge provides for the first time a unified test-bed for Social Signals such as laughter in speech. It further introduces conflict in group
openXBOW - Introducing the Passau Open-Source Crossmodal Bag-of-Words Toolkit
We introduce openXBOW, an open-source toolkit for the generation of bag-of-words (BoW) representations from multimodal input. In the BoW principle, word histograms were first used as features in
Snore Sound Classification Using Image-Based Deep Spectrum Features
TLDR
Results presented indicate that deep spectrum features extracted from the activations of the second fully connected layer of AlexNet using a viridis colour map are well suited to the task and outperforms the more conventional knowledge-based features of 6 373 acoustic functionals used in the INTERSPEECH ComParE 2017 Snoring sub-challenge baseline system.
...
...