A Multi-Feature Fusion Speech Emotion Recognition Method Based on Frequency Band Division and Improved Residual Network

@article{Guo2023AMF,
  title={A Multi-Feature Fusion Speech Emotion Recognition Method Based on Frequency Band Division and Improved Residual Network},
  author={Yi Guo and Yongping Zhou and Xuejun Xiong and Xin Jiang and Hanbing Tian and Qianxue Zhang},
  journal={IEEE Access},
  year={2023},
  volume={11},
  pages={86013-86024},
  url={https://api.semanticscholar.org/CorpusID:256409688}
}
The experimental results show that the proposed multi-feature fusion method can better classify kinds of emotions and an accuracy of 98% is achieved on the ESD dataset, which is nearly 6 percentage points higher than other methods.
4 Citations

Classification of Speech Emotion State Based on Feature Map Fusion of TCN and Pretrained CNN Model From Korean Speech Emotion Data

The experimental results showed that the proposed speech emotion recognition model revealed good performance in comparison to other previous works in most datasets.

Graph-based multi-Feature fusion method for speech emotion recognition

This paper proposes a novel graph-based fusion method to explicitly model the relationships between every pair of speech features, and demonstrates a 13% improvement over alternative fusion techniques, including those employing one dimensional edge-based feature fusion approach.

AI in Emergency Responses: Distress & Lie Detection Using Audio & EEG

Putting the EEG, facial expressions, and audio modalities together offer a holistic approach to the detection of falsification, and is built to be useful in situations that include different languages and numerous contexts so as to better enhance the processes involved in decision-making.

A Speech Emotion Recognition Method Based on Improved Residual Network

This paper improves the method of recognizing emotion from the spectrogram by adding channel attention and spatial attention mechanisms to ResNet34, which can make the network focus on the extraction of local key information, reduce noise, and improve the recognition rate of emotions.

Head Fusion: Improving the Accuracy and Robustness of Speech Emotion Recognition on the IEMOCAP and RAVDESS Dataset

This work implemented an attention-based convolutional neural network (ACNN) model and conducted experiments on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) data set and proposed a method called Head Fusion based on the multi-head attention mechanism to improve the accuracy of SER.

Speech Emotion Recognition based on Interactive Convolutional Neural Network

An interactive convolutional neural network (ICNN), where the input feature map will be factorized into different frequency scales for interactive Convolution, to improve the accuracy of SER tasks and reduce the redundant information of feature map effectively.

3-D Convolutional Recurrent Neural Networks With Attention Model for Speech Emotion Recognition

A three-dimensional attention-based convolutional recurrent neural networks to learn discriminative features for SER is proposed, where the Mel-spectrogram with deltas and delta-deltas are used as input.

Speech Emotion Recognition with Global-Aware Fusion on Multi-Scale Feature Representation

A novel GLobal-Aware Multi-scale (GLAM) neural network is proposed to learn multi-scale feature representation with global-aware fusion module to attend emotional information globally and demonstrates the superiority of this proposed model with 2.5% to 4.

High-level feature representation using recurrent neural network for speech emotion recognition

This paper presents a speech emotion recognition system using a recurrent neural network (RNN) model trained by an efficient learning algorithm. The proposed system takes into account the long-range…

LIGHT-SERNET: A Lightweight Fully Convolutional Neural Network for Speech Emotion Recognition

This paper proposes an efficient and lightweight fully convolutional neural network for speech emotion recognition in systems with limited hard-ware resources and achieves a higher performance on the IEMOCAP and EMO-DB datasets.

Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks

This paper proposes to learn affect-salient features for SER using convolutional neural networks (CNN), and shows that this approach leads to stable and robust recognition performance in complex scenes and outperforms several well-established SER features.

A Lightweight Model Based on Separable Convolution for Speech Emotion Recognition

This paper proposes a lightweight architecture with only fewer parameters which is based on separable convolution and inverted residuals which is based on separable convolution and inverted residuals which can enhance the importing of emotion-salient information.