Robustness to noise for speech emotion classification using CNNs and attention mechanisms

@article{Wijayasingha2020RobustnessTN,
  title={Robustness to noise for speech emotion classification using CNNs and attention mechanisms},
  author={Lahiru N. S. Wijayasingha and John A. Stankovic},
  journal={Smart Health},
  year={2020},
  pages={100165}
}
Robust Speech Emotion Recognition for Sindhi Language based on Deep Convolutional Neural Network
TLDR
This paper proposes a robust SER approach that focuses on improving the robust performance of SER for low resource languages such as Sindhi, and is the first SER work on the Sindhi language that utilizes data augmentation (DA) and deep learning techniques.
Emotion Recognition Robust to Indoor Environmental Distortions and Non-targeted Emotions Using Out-Of-Distribution Detection
TLDR
This work significantly improves realistic considerations for emotion detection by more comprehensively assessing different situations and combining CNN with out-of-data distribution detection, and increases the situations where emotions can be effectively detected and outperforms a state of theart baseline.
Emotion Recognition Robust to Indoor Environmental Distortions and Non-targeted Emotions Using Out-of-distribution Detection
TLDR
This work significantly improves realistic considerations for emotion detection by more comprehensively assessing different situations and combining CNN with out-of-data distribution detection, and increases the situations where emotions can be effectively detected and outperforms a state of theart baseline.
A Proposal for Multimodal Emotion Recognition Using Aural Transformers and Action Units on RAVDESS Dataset
TLDR
Results demonstrated that these modalities carried relevant information to detect users’ emotional state and their combination allowed to improve the final system performance.
Automated emotion recognition: Current trends and future perspectives.
Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning
TLDR
A multimodal emotion recognition system that relies on speech and facial information and the results revealed that these modalities carry relevant information to detect users' emotional state and their combination enables improvement of system performance.

References

SHOWING 1-10 OF 45 REFERENCES
Speech Emotion Recognition under White Noise
TLDR
The experimental results show that the speech enhancement algorithms constantly improve the performance of the emotion recognition system under various SNRs, and the positive emotions are more likely to be miss-classified as negative emotions under white noise environment.
Distant Emotion Recognition
TLDR
A novel solution for distant emotion recognition is presented, addressing the key challenges by identification and deletion of features from consideration which are significantly distorted by distance, creating a novel, called Emo2vec, feature modeling and overlapping speech filtering technique, and the use of an LSTM classifier to capture the temporal dynamics of speech states found in emotions.
Attention Based Fully Convolutional Network for Speech Emotion Recognition
TLDR
A novel attention based fully convolutional network is presented that is able to handle variable-length speech, free of the demand of segmentation to keep critical information not lost and outperformed the state-of-the-art methods on IEMOCAP corpus.
Efficient Emotion Recognition from Speech Using Deep Learning on Spectrograms
TLDR
A new implementation of emotion recognition from the para-lingual information in the speech, based on a deep neural network, applied directly to spectrograms, achieves higher recognition accuracy compared to previously published results, while also limiting the latency.
Real Time Distant Speech Emotion Recognition in Indoor Environments
TLDR
A novel combination of distorted feature elimination, classifier optimization, several signal cleaning techniques and train classifiers with synthetic reverberation obtained from a room impulse response generator to improve performance in a variety of rooms with various source-to-microphone distances.
Audio-visual emotion fusion (AVEF): A deep efficient weighted approach
Evaluating deep learning architectures for Speech Emotion Recognition
Speech emotion recognition using convolutional and Recurrent Neural Networks
TLDR
The main goal of the work is to propose a SER method based on concatenated CNNs and RNNs without using any traditional hand-crafted features, which was verified to have better accuracy than that achieved using conventional classification methods.
Deep convolutional recurrent neural network with attention mechanism for robust speech emotion recognition
TLDR
A deep convolutional recurrent neural network for speech emotion recognition based on the log-Mel filterbank energies is presented, where the Convolutional layers are responsible for the discriminative feature learning and aconvolutional attention mechanism is proposed to learn the utterance structure relevant to the task.
Speech Emotion Recognition by Combining Amplitude and Phase Information Using Convolutional Neural Network
Previous studies of speech emotion recognition utilize convolutional neural network (CNN) directly on amplitude spectrogram to extract features. CNN combines with bidirectional long short term memory
...
1
2
3
4
5
...