Automatic spoken affect classification and analysis

  title={Automatic spoken affect classification and analysis},
  author={Deb K. Roy and Alex Pentland},
  journal={Proceedings of the Second International Conference on Automatic Face and Gesture Recognition},
  • D. Roy, A. Pentland
  • Published 14 October 1996
  • Computer Science
  • Proceedings of the Second International Conference on Automatic Face and Gesture Recognition
This paper reports results from preliminary experiments on automatic classification of spoken affect valence. The task was to classify short spoken sentences into one of two classes: approving or disapproving. Using an optimal combination of six acoustic measurements our classifier achieved an accuracy of 65% to 88% for speaker dependent, text-independent classification. The results suggest that pitch and energy measurements may be used to automatically classify spoken affect valence but more… 

Figures, Tables, and Topics from this paper

Recognition of negative emotions from the speech signal
This paper reports on methods for automatic classification of spoken utterances based on the emotional state of the speaker based on a corpus of human-machine dialogues recorded from a commercial application deployed by SpeechWorks.
Classifying emotions in human-machine spoken dialogs
This paper reports on the comparison between various acoustic feature sets and classification algorithms for classifying spoken utterances based on the emotional state of the speaker, using three different techniques - linear discriminant classifier (LDC), k-nearest neighborhood (k-NN) classifier, and support vector machine classifier -for classifying utterances into 2 emotion classes: negative and non-negative.
Baby Ears: a recognition system for affective vocalizations
  • M. Slaney, G. McRoberts
  • Psychology, Computer Science
    Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181)
  • 1998
As previous studies have shown, changes in pitch are an important cue for affective messages; it is found that timbre or cepstral coefficients are also important.
Automatic detection of stress in speech
We have developed software based on the Stevens landmark theory to extract features in utterances in and adjacent to voiced regions. We then apply two statistical methods, closest-match (CM) and
Toward detecting emotions in spoken dialogs
This paper explores the detection of domain-specific emotions using language and discourse information in conjunction with acoustic correlates of emotion in speech signals on a case study of detecting negative and non-negative emotions using spoken language data obtained from a call center application.
Impact of Emotion on Prosody Analysis
Speech can be described as an act of producing voice through the use of the vocal folds and vocal apparatus to create a linguistic act designed to convey information. Linguists classify the speech
A Study on Prosody Analysis
Speech can be described as an act of producing voice through the use of the vocal folds and vocal apparatus to create a linguistic act designed to convey information. Linguists classify the speech
BabyEars: A recognition system for affective vocalizations
Mothers' speech was significantly easier to classify than fathers' speech, suggesting either clearer distinctions among these messages in mothers' speech to infants, or a difference between fathers and mothers in the acoustic information used to convey these messages.
Prosody Analysis for Speaker Affect Determination
This work states that the extralinguistic aspect of speech is considered a source of variability that theoretically can be minimized with an appropriate preprocessing technique; determination of such robust techniques is however, far from trivial.
BabyEars : A recognition system for affective vocalizations q
Our goal was to see how much of the affective message we could recover using simple acoustic measures of the speech signal. Using pitch and broad spectral-shape measures, a multidimensional Gaussian


Emotions and speech: some acoustical correlates.
Some further attempts to identify and measure those parameters in the speech signal that reflect the emotional state of a speaker.
Vocal cues to speaker affect: testing two models
We identified certain assumptions implicit in two divergent approaches to studying vocal affect signaling. The ‘‘covariance’’ model assumes that nonverbal cues function independently of verbal
Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion.
  • I. Murray, J. Arnott
  • Psychology, Medicine
    The Journal of the Acoustical Society of America
  • 1993
The voice parameters affected by emotion are found to be of three main types: voice quality, utterance timing, and utterance pitch contour.
Minimal cues in the vocal communication of affect: Judging emotions from content-masked speech
The results suggest that a minimal set of vocal cues consisting of pitch level and variations, amplitude level and variation, and rate of articulation may be sufficient to communicate the evaluation, potency, and activity dimensions of emotional meaning.
Analysis, synthesis, and perception of voice quality variations among female and male talkers.
Perceptual validation of the relative importance of acoustic cues for signaling a breathy voice quality has been accomplished using a new voicing source model for synthesis of more natural male and female voices.
Acoustic and perceptual indicators of emotional stress.
Tape recordings of telephone conversations of Consolidated Edison's system operator (SO) and his immediate superior (CSO), beginning an hour before the 1977 New York blackout, indicated that whereas CSO's vocal pitch increased significantly with increased situational stress, SO's pitch decreased.
The long-term spectrum and perceived emotion
The LTS was systematically related to the affective dimensions in certain frequency ranges and no significant sex or ethnic group effects were found.
Generating expression in synthesized speech
This document is a revised version of my master's thesis, submitted in May, 1989 to the Media Arts and Sciences Section of the Department of Architecture, at the Massachusetts Institute of
Prosodic, Paralinguistic, and Interactional Features in Parent-Child Speech: English and Spanish.
Parents employ a special register when speaking to young children, containing features that mark it as appropriate for children who are beginning to acquire their language. Parental speech in English
Affective Computing
Key issues in affective computing, " computing that relates to, arises from, or influences emotions", are presented and new applications are presented for computer-assisted learning, perceptual information retrieval, arts and entertainment, and human health and interaction.