Toward detecting emotions in spoken dialogs

  title={Toward detecting emotions in spoken dialogs},
  author={Chul Min Lee and Shrikanth S. Narayanan},
  journal={IEEE Transactions on Speech and Audio Processing},
The importance of automatically recognizing emotions from human speech has grown with the increasing role of spoken language interfaces in human-computer interaction applications. [] Key Method To capture emotion information at the language level, an information-theoretic notion of emotional salience is introduced.

Figures and Tables from this paper

Instantaneous Emotion Detection System using Vocalizations

This paper is a model of "Emotion Detection System" in order to understand how emotional viewing could be coded and that has led to the devising of emotional film structures.

Fusion of Acoustic and Linguistic Features for Emotion Detection

A system that deploys acoustic and linguistic information from speech in order to decide whether the utterance contains negative or non-negative meaning, based on the degree of emotional salience of the words is described.

Recognizing child's emotional state in problem-solving child-machine interactions

Experimental results show that the addition of visual information to acoustic information yields relative improvements in emotion recognition of 3.8% with both LDC and SVC classifiers for information fusion at the feature level over that of using only acoustic information.

Emotion recognition using imperfect speech recognition

The results show that emotion recognition performance stays roughly constant as long as word accuracy doesn’t fall below a reasonable value, making the use of speech-to-text viable for training of emotion classifiers based on linguistics.

Emotion classification in children's speech using fusion of acoustic and linguistic features

A system to detect angry vs. non-angry utterances of children who are engaged in dialog with an Aibo robot dog, submitted to the Interspeech2009 Emotion Challenge evaluation.

Speech emotion detection based on neural networks

An experimental study on six emotions, happiness, sadness, anger, fear, neutral and boredom is reported, which uses speech fundamental frequency, formants, energy and voicing rate as extracted features.

Recognizing Emotional State Changes Using Speech Processing

This paper explores speech from database and long-term speech recordings to analyze of mood changing in individual speaker during long- term speech, and introduces a learning method based on statistical model to classify emotional states and moods of utterance, and also track its changes.

Negative Emotion Recognition in Spoken Dialogs

A novel deep learning model is proposed, multi-feature stacked denoising autoencoders (MSDA), which can fuse the high-level representations of the acoustic and linguistic features along with contexts to classify emotions.

Recognizing emotion from Turkish speech using acoustic features

A new Turkish emotional speech database, which includes 5,100 utterances extracted from 55 Turkish movies, was constructed and promising results for activation and dominance dimensions were obtained.

Spontaneous speech emotion recognition using prior knowledge

A spontaneous speech emotion recognition framework that makes use of the contexts and the knowledge regarding the time lapse of the spoken utterances in the context of an audio call to reliably recognize the current emotion of the speaker in spontaneous audio conversations is proposed.



Recognition of negative emotions from the speech signal

This paper reports on methods for automatic classification of spoken utterances based on the emotional state of the speaker based on a corpus of human-machine dialogues recorded from a commercial application deployed by SpeechWorks.

Automatic recognition of emotion from voice: a rough benchmark

A study that offers a rough benchmark for automatic recognition of a speaker’s emotions, using speech data from five passages selected following pilot studies because they were effective at evoking specific emotion fear, anger, happiness, sadness, and neutrality.

A cross-cultural investigation of emotion inferences from voice and speech: implications for speech technology

This contribution describes the first large-scale effort to obtain empirical data on whether the vocal changes produced by emotional and attitudinal factors are universal or vary over cultures and/or languages by studying emotion recognition from voice in nine countries on three different continents.

Emotion recognition using a data-driven fuzzy inference system

Results from on spoken dialog data from a call center application show that the optimized FIS with two rules (FIS-2) improves emotion classification by 63.0% for male data and 73.7% for female over previous results obtained using linear discriminant classifier.

Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion.

The voice parameters affected by emotion are found to be of three main types: voice quality, utterance timing, and utterance pitch contour.

Automatic Spoken Affect Analysis and Classification

The results suggest that pitch and energy measurements may be used to automatically classify spoken affect but more research will be necessary to understand individual variations and how to broaden the range of affect classes which can be recognized.

Automatic spoken affect classification and analysis

  • D. RoyA. Pentland
  • Physics
    Proceedings of the Second International Conference on Automatic Face and Gesture Recognition
  • 1996
The results suggest that pitch and energy measurements may be used to automatically classify spoken affect valence but more research will be necessary to understand individual variations and how to broaden the range of affect classes which can be recognized.


Automatic dialogue systems used in call-centers, for instance, should be able to determine in a critical phase of the dialogue indicated by the costumers vocal expression of anger/irritation when it

Prosody-based automatic detection of annoyance and frustration in human-computer dialog

Results show that a prosodic model can predict whether an utterance is neutral ve sus “annoyed or frustrated” with an accuracy on par with that of human interlabeler agreement.

Recognizing emotion in speech

A new method of extracting prosodic features from speech, based on a smoothing spline approximation of the pitch contour, is presented, which obtains classification performance that is close to human performance on the task.