Emotional speaker identification using a novel capsule nets model

  title={Emotional speaker identification using a novel capsule nets model},
  author={Ali Bou Nassif and Ismail Shahin and Ashraf Elnagar and Divya P. Velayudhan and Adi Alhudhaif and Kemal Polat},
  journal={Expert Syst. Appl.},

A Multi-Lingual Speech Recognition-Based Framework to Human-Drone Interaction

Real time tests have shown that the approach is very promising as an alternative form of human–drone interaction while offering the benefit of control simplicity.

MeWEHV: Mel and Wave Embeddings for Human Voice Tasks

A pipeline to create a new model, called Mel and Wave Embeddings for Human Voice Tasks (MeWEHV), capable of generating robust embeddings for speech processing, and allows a significant increase in the performance of state-of-the-art models on all the tested datasets, with a low additional computational cost.



Speech Emotion Recognition Using Capsule Networks

  • Xixin WuSongxiang Liu H. Meng
  • Computer Science
    ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2019
This paper presents a novel architecture based on the capsule networks (CapsNets) for SER that can take into account the spatial relationship of speech features in spectrograms, and provide an effective pooling method for obtaining utterance global features.

Speaker Verification using Convolutional Neural Networks

In this paper, a novel Convolutional Neural Network architecture has been developed for speaker verification in order to simultaneously capture and discard speaker and non-speaker information,

Speaker Identification in Different Emotional States in Arabic and English

This paper proposes a speaker recognition system corresponding to three states, namely emotional, neutral, and with no consideration for a speaker’s state, for two languages: Arabic and English.

Emotional Speaker Recognition based on Machine and Deep Learning

  • T. SefaraT. Mokgonyane
  • Computer Science
    2020 2nd International Multidisciplinary Information Technology and Engineering Conference (IMITEC)
  • 2020
An emotional speaker recognition system trained using machine and deep learning algorithms using time, frequency and spectral features on emotional speech database acquired from the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS).

GMM and CNN Hybrid Method for Short Utterance Speaker Recognition

A novel model to enhance the recognition accuracy of the short utterance speaker recognition system is proposed using a convolutional neural network to process spectrograms, which can describe speakers better and gains the considerable accuracy as well as the reasonable convergence speed.

A Study on Text-Independent Speaker Recognition Systems in Emotional Conditions Using Different Pattern Recognition Models

The results show that, the performance of a speaker recognition system trained with clean speech is degrading while testing with emotional data irrespective of feature used or model used to build the system.

Predicting speaker recognition reliability by considering emotional content

This study collects a unique emotional database from 80 speakers and estimates speaker recognition performance as a function of arousal and valence, creating regions in this space where a speaker can reliably recognize the identity of a speaker.

A study of speaker verification performance with expressive speech

The results show that speaker verification errors increase when the values of the emotional attributes increase, and for neutral/moderate values of arousal, valence and dominance, the speaker verification performance are reliable.

Speaker recognition in the case of emotional environment using transformation of speech features

Results obtained in this work highly indicate the influence of emotional speech during speaker recognition.