Effectiveness of Voice Quality Features in Detecting Depression

@inproceedings{Afshan2018EffectivenessOV,
  title={Effectiveness of Voice Quality Features in Detecting Depression},
  author={Amber Afshan and Jinxi Guo and Soo Jin Park and Vijay Ravi and Jonathan Flint and Abeer Alwan},
  booktitle={INTERSPEECH},
  year={2018}
}
Automatic assessment of depression from speech signals is affected by variabilities in acoustic content and speakers. [] Key Method In order to capture various aspects of speech signals, we used voice quality features in addition to conventional cepstral features. The features (F0, F1, F2, F3, H1-H2, H2-H4, H4-H2k, A1, A2, A3, and CPP) were inspired by a psychoacoustic model of voice quality [1].

Tables from this paper

A Generalizable Speech Emotion Recognition Model Reveals Depression and Remission
TLDR
A generalizable speech emotion recognition model can effectively reveal changes in speaker depressive states before and after remission in patients with MDD.
Depression-level assessment from multi-lingual conversational speech data using acoustic and text features
TLDR
A proposed multi-lingual method was effective at selecting better features than the baseline algorithms, which significantly improved the depression assessment accuracy and a novel algorithm to fuse the text- and speech-based classifiers which further boosted the performance.
Re-examining the robustness of voice features in predicting depression: Compared with baseline of confounders
TLDR
It is demonstrated that voice features are effective in predicting depression and indicate that more sophisticated models based on voice features can be built to help in clinical diagnosis.
Fraug: A Frame Rate Based Data Augmentation Method for Depression Detection from Speech Signals
TLDR
The proposed method outperformed commonly used data-augmentation methods such as noise augmentation, VTLP, Speed, and Pitch Perturbation and all improvements were statistically significant.
Learning Voice Source Related Information for Depression Detection
TLDR
Modelling of low pass filtered speech signals, linear prediction residual signals, homomorphically filtered voice source signals and zero frequency filtered signals to learn voice source related information for depression detection leads to systems better than the state-of-the-art low level descriptor based systems and deep learning based systems modelling the vocal tract system information.
Voice Quality and Between-Frame Entropy for Sleepiness Estimation
TLDR
This study addresses the ComparE 2019 Continuous Sleepiness task of estimating the degree of sleepiness from voice data by proposing the voice quality feature set, and between-frame entropy was proposed as an instantaneous measure of the speaking rate.
Analysis and classification of phonation types in speech and singing voice
Unsupervised Instance Discriminative Learning for Depression Detection from Speech Signals
TLDR
A modified Instance Discriminative Learning (IDL) method is proposed, an unsupervised pre-training technique, to extract augment-invariant and instance-spread-out embeddings, and a novel sampling strategy, Pseudo Instance-based Sampling (PIS), based on clustering algorithms, to enhance spread-out characteristics of theembeddings.
Depression Severity Detection Using Read Speech with a Divide-and-Conquer Approach
  • Namhee Kwon, Samuel Kim
  • Psychology
    2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)
  • 2021
TLDR
A divide-and-conquer approach to detect depression severity using speech features based on their attributes, i.e., acoustic, prosodic, and language features, then fuse them in a modeling stage with fully connected deep neural networks.
...
...

References

SHOWING 1-10 OF 44 REFERENCES
Detecting depression: A comparison between spontaneous and read speech
TLDR
It is found that spontaneous speech has more variability, which increases the recognition rate of depression, and jitter, shimmer, energy and loudness feature groups are robust in characterising both read and spontaneous depressive speech.
Comparing objective feature statistics of speech for classifying clinical depression
TLDR
This study provides a comparison of the major categories of speech analysis in the application of identifying and clustering feature statistics from a control group and a patient group suffering from a clinical diagnosis of depression.
A study of acoustic features for depression detection
TLDR
A depression estimation approach in which the audio data is segmented and projected into a total variability subspace was used, and these projected data was used to estimate the depression level by performing support vector regression.
Automatic modelling of depressed speech: relevant features and relevance of gender
TLDR
A small group of acoustic features modelling prosody and spectrum that have been proven successful in the modelling of sleepy speech are employed, enriched with voice quality features, for the modeling of depressed speech within a regression approach.
Elicitation Design for Acoustic Depression Classification: An Investigation of Articulation Effort, Linguistic Complexity, and Word Affect
TLDR
Interestingly, experiment results demonstrate that by selecting speech with higher articulation effort, linguistic complexity, or word-based arousal/valence, improvements in acoustic speech-based feature depression classification performance can be achieved, serving as a guide for future elicitation design.
Modeling spectral variability for the classification of depressed speech
TLDR
Investigating the hypothesis that important depression based information can be captured within the covariance structure of a Gaussian Mixture Model of recorded speech finds significant negative correlations found between a speaker’s average weighted variance a GMMbased indicator of speaker variability and their level of depression support this hypothesis.
Variability compensation in small data: Oversampled extraction of i-vectors for the classification of depressed speech
Variations in the acoustic space due to changes in speaker mental state are potentially overshadowed by variability due to speaker identity and phonetic content. Using the Audio/Visual Emotion
Detecting Depression using Vocal, Facial and Semantic Communication Cues
Major depressive disorder (MDD) is known to result in neurophysiological and neurocognitive changes that affect control of motor, linguistic, and cognitive functions. MDD's impact on these processes
Investigating voice quality as a speaker-independent indicator of depression and PTSD
TLDR
This work investigates voice quality characteristics, in particular on a breathy to tense dimension, as an indicator for psychological distress within semi-structured virtual human interviews, and investigates the capability of automatic algorithms to classify psychologically distressed speech in speaker-independent experiments.
...
...