Non-linear frequency warping using constant-Q transformation for speech emotion recognition

  title={Non-linear frequency warping using constant-Q transformation for speech emotion recognition},
  author={Premjeet Singh and Goutam Saha and Md. Sahidullah},
  journal={2021 International Conference on Computer Communication and Informatics (ICCCI)},
In this work, we explore the constant-Q transform (CQT) for speech emotion recognition (SER). The CQT-based time-frequency analysis provides variable spectro-temporal resolution with higher frequency resolution at lower frequencies. Since lower-frequency regions of speech signal contain more emotion-related information than higher-frequency regions, the increased low-frequency resolution of CQT makes it more promising for SER than standard short-time Fourier transform (STFT). We present a… Expand

Figures and Tables from this paper

Deep scattering network for speech emotion recognition
This paper investigates layerwise scattering coefficients to analyse the importance of time shift and deformation stable scalogram and modulation spectrum coefficients for SER and observes that layer-wise coefficients taken independently also perform better than MFCCs. Expand


Amplitude-Frequency Analysis of Emotional Speech Using Transfer Learning and Classification of Spectrogram Images
This study used an indirect approach to provide insights into the amplitude-frequency characteristics of different emotions in order to support the development of future, more efficiently differentiating SER methods. Expand
Speech emotion recognition with deep convolutional neural networks
A new architecture is introduced, which extracts mel-frequency cepstral coefficients, chromagram, mel-scale spectrogram, Tonnetz representation, and spectral contrast features from sound files and uses them as inputs for the one-dimensional Convolutional Neural Network for the identification of emotions using samples from the Ryerson Audio-Visual Database of Emotional Speech and Song, Berlin, and EMO-DB datasets. Expand
Application of speaker- and language identification state-of-the-art techniques for emotion recognition
This paper describes the efforts of transferring feature extraction and statistical modeling techniques from the fields of speaker and language identification to the related field of emotion recognition and shows how to apply Gaussian Mixture Modeling techniques on top of it. Expand
Formant position based weighted spectral features for emotion recognition
This paper proposes novel spectrally weighted mel-frequency cepstral coefficient (WMFCC) features for emotion recognition from speech, and evaluates the proposed WMFCC features together with the standard spectral and prosody features using HMM based classifiers on the spontaneous FAU Aibo emotional speech corpus. Expand
Synthetic speech detection using fundamental frequency variation and spectral features
This paper proposed a new approach to detect synthetic speech using score-level fusion of front-end features namely, constant Q cepstral coefficients (CQCCs), all-pole group delay function (APGDF) and fundamental frequency variation (FFV), which outperforms all existing baseline features for both known and unknown attacks. Expand
A comparative study of traditional and newly proposed features for recognition of speech under stress
The results show that unlike fast Fourier transform's (FFT) immunity to noise, the linear prediction power spectrum is more immune than FFT to stress as well as to a combination of a noisy and stressful environment. Expand
Towards a standard set of acoustic features for the processing of emotion in speech.
Researchers concerned with the automatic recognition of human emotion in speech have proposed a considerable variety of segmental and supra-segmental acoustic descriptors. These range from prosodicExpand
Multiscale Amplitude Feature and Significance of Enhanced Vocal Tract Information for Emotion Classification
A novel multiscale amplitude feature is proposed using multiresolution analysis (MRA) and the significance of the vocal tract is investigated for emotion classification from the speech signal and the proposed feature outperforms the other features. Expand
New approach in quantification of emotional intensity from the speech signal: emotional temperature
This new approach provides a comparable performance with lower complexity than other approaches for real-time applications, thus making it an appealing alternative, may assist in the future development of automatic speech emotion recognition systems with continuous tracking. Expand
The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing
A basic standard acoustic parameter set for various areas of automatic voice analysis, such as paralinguistic or clinical speech analysis, is proposed and intended to provide a common baseline for evaluation of future research and eliminate differences caused by varying parameter sets or even different implementations of the same parameters. Expand