Non-linear frequency warping using constant-Q transformation for speech emotion recognition

@article{Singh2021NonlinearFW,
  title={Non-linear frequency warping using constant-Q transformation for speech emotion recognition},
  author={Premjeet Singh and Goutam Saha and Md. Sahidullah},
  journal={2021 International Conference on Computer Communication and Informatics (ICCCI)},
  year={2021},
  pages={1-6}
}
In this work, we explore the constant-Q transform (CQT) for speech emotion recognition (SER). The CQT-based time-frequency analysis provides variable spectro-temporal resolution with higher frequency resolution at lower frequencies. Since lower-frequency regions of speech signal contain more emotion-related information than higher-frequency regions, the increased low-frequency resolution of CQT makes it more promising for SER than standard short-time Fourier transform (STFT). We present a… 

Figures and Tables from this paper

Deep scattering network for speech emotion recognition

TLDR
This paper introduces scattering transform for speech emotion recognition (SER), and investigates layer-wise scattering coefficients to analyse the importance of time shift and deformation stable scalogram and modulation spectrum coefficients for SER.

References

SHOWING 1-10 OF 37 REFERENCES

Amplitude-Frequency Analysis of Emotional Speech Using Transfer Learning and Classification of Spectrogram Images

TLDR
This study used an indirect approach to provide insights into the amplitude-frequency characteristics of different emotions in order to support the development of future, more efficiently differentiating SER methods.

Speech emotion recognition with deep convolutional neural networks

Formant position based weighted spectral features for emotion recognition

Synthetic speech detection using fundamental frequency variation and spectral features

A comparative study of traditional and newly proposed features for recognition of speech under stress

TLDR
The results show that unlike fast Fourier transform's (FFT) immunity to noise, the linear prediction power spectrum is more immune than FFT to stress as well as to a combination of a noisy and stressful environment.

Towards a standard set of acoustic features for the processing of emotion in speech.

Researchers concerned with the automatic recognition of human emotion in speech have proposed a considerable variety of segmental and supra-segmental acoustic descriptors. These range from prosodic

Multiscale Amplitude Feature and Significance of Enhanced Vocal Tract Information for Emotion Classification

TLDR
A novel multiscale amplitude feature is proposed using multiresolution analysis (MRA) and the significance of the vocal tract is investigated for emotion classification from the speech signal and the proposed feature outperforms the other features.

Speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching

TLDR
This paper explores how to utilize a DCNN to bridge the affective gap in speech signals, and finds that the DCNN model pretrained for image applications performs reasonably good in affective speech feature extraction.