Speech Emotion Recognition Using Spectrogram & Phoneme Embedding
@inproceedings{Yenigalla2018SpeechER, title={Speech Emotion Recognition Using Spectrogram \& Phoneme Embedding}, author={Promod Yenigalla and Abhay Kumar and Suraj Tripathi and Chirag Singh and Sibsambhu Kar and Jithendra Vepa}, booktitle={INTERSPEECH}, year={2018} }
This paper proposes a speech emotion recognition method based on phoneme sequence and spectrogram. Both phoneme sequence and spectrogram retain emotion contents of speech which is missed if the speech is converted into text. We performed various experiments with different kinds of deep neural networks with phoneme and spectrogram as inputs. Three of those network architectures are presented here that helped to achieve better accuracy when compared to the stateof-the-art methods on benchmark…
75 Citations
A Hybrid Technique using CNN+LSTM for Speech Emotion Recognition
- Computer ScienceInternational Journal of Engineering and Advanced Technology
- 2020
This paper is motivated by using spectrograms as inputs to the hybrid deep convolutional LSTM for speech emotion recognition, and the proposed model is highly capable as it obtained an accuracy of 94.26%.
3D CNN-Based Speech Emotion Recognition Using K-Means Clustering and Spectrograms
- Computer ScienceEntropy
- 2019
An emotion recognition system based on analysis of speech signals that is superior to the state-of-the-art methods reported in the literature is proposed.
Speech Emotion Recognition using Convolution Neural Networks and Deep Stride Convolutional Neural Networks
- Computer Science2020 6th International Conference on Wireless and Telematics (ICWT)
- 2020
A recently developed different network architecture of convolutional neural networks, i.e., Deep Stride Convolutional Neural Networks (DSCNN), is modified by taking a smaller number of convotional layers to increase the computational speed while still maintaining accuracy.
Speech emotion recognition based on formant characteristics feature extraction and phoneme type convergence
- Computer ScienceInf. Sci.
- 2021
Focal Loss based Residual Convolutional Neural Network for Speech Emotion Recognition
- Computer ScienceArXiv
- 2019
A Residual Convolutional Neural Network based on speech features and trained under Focal Loss to recognize emotion in speech is proposed, preventing the model from being overwhelmed by easily classifiable examples.
Fine-grained Dynamical Speech Emotion Analysis Utilizing Networks Customized for Acoustic Data
- Computer Science2020 IEEE International Conference on Advances in Electrical Engineering and Computer Applications( AEECA)
- 2020
A new method to do fine-grained dynamical speech emotion analysis utilizing neural networks customized for acoustic data is proposed and Emotion time unit (ETU) is introduced to model the dynamic change of speech emotion and improve recognition accuracy in utterance level.
Emotion recognition from speech using spectrograms and shallow neural networks
- Computer ScienceMoMM
- 2020
A SER (Speech Emotion Recognition) system is proposed in which the power of DL models in self pattern recognition together with the ability of working on small databases is combined.
DNN-based Emotion Recognition Based on Bottleneck Acoustic Features and Lexical Features
- Computer ScienceICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2019
Experimental results on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) multi-modal dataset showed 75.5% in unweighted accuracy recall, which outperformed the best results reported previously in the multimodal emotion recognition using acoustic and lexical features.
Speech Emotion Recognition Using Scalogram Based Deep Structure
- Computer Science
- 2020
This work has proposed an SER method based on a concatenated Convolutional Neural Network and a Recurrent Neural Network that combines the strengths of both networks to learn long-term temporal relationships of the learned features.
Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation
- Computer ScienceICMI
- 2021
Experimental results indicate that the transfer learning and spectrogram augmentation approaches improve the SER performance, and when combined achieve state-of-the-art results.
References
SHOWING 1-10 OF 23 REFERENCES
Towards real-time Speech Emotion Recognition using deep neural networks
- Computer Science2015 9th International Conference on Signal Processing and Communication Systems (ICSPCS)
- 2015
A Deep Neural Network (DNN) that recognizes emotions from a one second frame of raw speech spectrograms is presented and investigated and is achievable due to a deep hierarchical architecture, data augmentation, and sensible regularization.
Efficient Emotion Recognition from Speech Using Deep Learning on Spectrograms
- Computer ScienceINTERSPEECH
- 2017
A new implementation of emotion recognition from the para-lingual information in the speech, based on a deep neural network, applied directly to spectrograms, achieves higher recognition accuracy compared to previously published results, while also limiting the latency.
An experimental study of speech emotion recognition based on deep convolutional neural networks
- Computer Science2015 International Conference on Affective Computing and Intelligent Interaction (ACII)
- 2015
Preliminary experiments show the proposed emotion recognition system based on DCNNs achieves about 40% classification accuracy and outperforms the SVM based classification using the hand-crafted acoustic features.
Speech emotion recognition using deep neural network and extreme learning machine
- Computer ScienceINTERSPEECH
- 2014
The experimental results demonstrate that the proposed approach effectively learns emotional information from low-level features and leads to 20% relative accuracy improvement compared to the state of the art approaches.
Emotion recognition in spontaneous speech using GMMs
- Computer ScienceINTERSPEECH
- 2006
The results indicate that using Gaussian mixture models on the frame level is a feasible technique for emotion classification, and combining the three classifiers significantly improves performance.
Emotion Recognition From Speech With Recurrent Neural Networks
- Computer ScienceArXiv
- 2017
The effectiveness of the proposed deep recurrent neural network trained on a sequence of acoustic features calculated over small speech intervals and special probabilistic-nature CTC loss function allows to consider long utterances containing both emotional and neutral parts is shown.
High-level feature representation using recurrent neural network for speech emotion recognition
- Computer ScienceINTERSPEECH
- 2015
This paper presents a speech emotion recognition system using a recurrent neural network (RNN) model trained by an efficient learning algorithm. The proposed system takes into account the long-range…
Multi-level Speech Emotion Recognition Based on HMM and ANN
- Computer Science2009 WRI World Congress on Computer Science and Information Engineering
- 2009
Comparison between isolated HMMs and hybrid of HMMs/ANN proves that the approach introduced is more effective, and the average recognition rate of five emotion states has reached 81.7%.
Emotion Recognition of Affective Speech Based on Multiple Classifiers Using Acoustic-Prosodic Information and Semantic Labels
- Computer ScienceIEEE Transactions on Affective Computing
- 2011
This work presents an approach to emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information (AP) and semantic labels (SLs) and reveals that the recognition accuracy of the proposed approach can be further improved to 85.79 percent.
GMM Supervector Based SVM with Spectral Features for Speech Emotion Recognition
- Computer Science2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07
- 2007
Experimental results on an emotional speech database demonstrate that the GMM supervector based SVM outperforms standard GMM on speech emotion recognition.