Corpus ID: 236493746

Significance of Speaker Embeddings and Temporal Context for Depression Detection

  title={Significance of Speaker Embeddings and Temporal Context for Depression Detection},
  author={Sri Harsha Dumpala and Sebastian Rodriguez and Sheri Rempel and Rudolf Uher and Sageev Oore},
Depression detection from speech has attracted a lot of attention in recent years. However, the significance of speaker-specific information in depression detection has not yet been explored. In this work, we analyze the significance of speaker embeddings for the task of depression detection from speech. Experimental results show that the speaker embeddings provide important cues to achieve state-of-the-art performance in depression detection. We also show that combining conventional OpenSMILE… Expand

Figures and Tables from this paper


Estimating Severity of Depression From Acoustic Features and Embeddings of Natural Speech
In audio recordings of a narrative by individuals diagnosed with major depressive disorder, spectral-based and excitation source-based features extracted from speech, and significance of sentiment and emotion classification in estimation of depression severity are analyzed. Expand
Exploiting Vocal Tract Coordination Using Dilated CNNS For Depression Detection In Naturalistic Environments
A novel way to extract full vocal tract coordination (FVTC) features by use of convolutional neural networks (CNNs) is proposed, overcoming earlier shortcomings. Expand
DepAudioNet: An Efficient Deep Model for Audio based Depression Classification
A deep model is proposed, namely DepAudioNet, to encode the depression related characteristics in the vocal channel, combining Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) to deliver a more comprehensive audio representation. Expand
Investigation of Speech Landmark Patterns for Depression Detection
Evaluations of both landmark duration features and landmark n-gram features on the DAIC-WOZ and SH2 datasets show that they are highly effective, either alone or fused, relative to existing approaches. Expand
Automated speech-based screening of depression using deep convolutional neural networks
This paper proposes a novel approach to automated depression detection in speech using convolutional neural network (CNN) and multipart interactive training and gives a promising baseline accuracy reaching 77%. Expand
An Investigation of Depressed Speech Detection: Features and Normalization
Questions remaining include how speech segments should be selected, what features provide good discrimination, and what benefits feature normalization might bring given the speaker-specific nature of mental disorders are addressed empirically using classifier configurations employed in emotion recognition from speech. Expand
AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge
The challenge guidelines, the common data used, and the performance of the baseline system on the two tasks are presented, to establish to what extent fusion of the approaches is possible and beneficial. Expand
Spotting the Traces of Depression in Read Speech: An Approach Based on Computational Paralinguistics and Social Signal Processing
The results show that features expected to capture such differences reduce the error rate of a baseline classifier by more than 50% and appear to be in line with the findings of neuroscience about brain-level differences between depressed and non-depressed individuals. Expand
Optimizing Speech-Input Length for Speaker-Independent Depression Classification
This work analyzes results for speakerindependent depression classification using a corpus of over 1400 hours of speech from a human-machine health screening application and examines performance as a function of response input length for two NLP systems that differ in overall performance. Expand
Influence of acoustic low-level descriptors in the detection of clinical depression in adolescents
The influence that classification accuracies have in speech analysis from a clinical dataset is reported by adding acoustic low-level descriptors (LLD) belonging to prosodic and spectral features to two baseline features of Mel frequency cepstral coefficients and Teager energy critical-band based autocorrelation envelope. Expand