Corpus ID: 236493746

Significance of Speaker Embeddings and Temporal Context for Depression Detection

@article{Dumpala2021SignificanceOS,
  title={Significance of Speaker Embeddings and Temporal Context for Depression Detection},
  author={Sri Harsha Dumpala and Sebastian Rodriguez and Sheri Rempel and Rudolf Uher and Sageev Oore},
  journal={ArXiv},
  year={2021},
  volume={abs/2107.13969}
}
Depression detection from speech has attracted a lot of attention in recent years. However, the significance of speaker-specific information in depression detection has not yet been explored. In this work, we analyze the significance of speaker embeddings for the task of depression detection from speech. Experimental results show that the speaker embeddings provide important cues to achieve state-of-the-art performance in depression detection. We also show that combining conventional OpenSMILE… Expand

Figures and Tables from this paper

References

SHOWING 1-10 OF 40 REFERENCES
Estimating Severity of Depression From Acoustic Features and Embeddings of Natural Speech
TLDR
In audio recordings of a narrative by individuals diagnosed with major depressive disorder, spectral-based and excitation source-based features extracted from speech, and significance of sentiment and emotion classification in estimation of depression severity are analyzed. Expand
Exploiting Vocal Tract Coordination Using Dilated CNNS For Depression Detection In Naturalistic Environments
TLDR
A novel way to extract full vocal tract coordination (FVTC) features by use of convolutional neural networks (CNNs) is proposed, overcoming earlier shortcomings. Expand
DepAudioNet: An Efficient Deep Model for Audio based Depression Classification
TLDR
A deep model is proposed, namely DepAudioNet, to encode the depression related characteristics in the vocal channel, combining Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) to deliver a more comprehensive audio representation. Expand
Investigation of Speech Landmark Patterns for Depression Detection
TLDR
Evaluations of both landmark duration features and landmark n-gram features on the DAIC-WOZ and SH2 datasets show that they are highly effective, either alone or fused, relative to existing approaches. Expand
Automated speech-based screening of depression using deep convolutional neural networks
TLDR
This paper proposes a novel approach to automated depression detection in speech using convolutional neural network (CNN) and multipart interactive training and gives a promising baseline accuracy reaching 77%. Expand
An Investigation of Depressed Speech Detection: Features and Normalization
TLDR
Questions remaining include how speech segments should be selected, what features provide good discrimination, and what benefits feature normalization might bring given the speaker-specific nature of mental disorders are addressed empirically using classifier configurations employed in emotion recognition from speech. Expand
AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge
TLDR
The challenge guidelines, the common data used, and the performance of the baseline system on the two tasks are presented, to establish to what extent fusion of the approaches is possible and beneficial. Expand
Spotting the Traces of Depression in Read Speech: An Approach Based on Computational Paralinguistics and Social Signal Processing
TLDR
The results show that features expected to capture such differences reduce the error rate of a baseline classifier by more than 50% and appear to be in line with the findings of neuroscience about brain-level differences between depressed and non-depressed individuals. Expand
Optimizing Speech-Input Length for Speaker-Independent Depression Classification
TLDR
This work analyzes results for speakerindependent depression classification using a corpus of over 1400 hours of speech from a human-machine health screening application and examines performance as a function of response input length for two NLP systems that differ in overall performance. Expand
Influence of acoustic low-level descriptors in the detection of clinical depression in adolescents
TLDR
The influence that classification accuracies have in speech analysis from a clinical dataset is reported by adding acoustic low-level descriptors (LLD) belonging to prosodic and spectral features to two baseline features of Mel frequency cepstral coefficients and Teager energy critical-band based autocorrelation envelope. Expand
...
1
2
3
4
...