Constantine Kotropoulos

Learn More
In this paper we overview emotional speech recognition having in mind three goals. The first goal is to provide an up-todate record of the available emotional speech data collections. The number of emotional states, the language, the number of speakers, and the kind of speech are briefly addressed. The second goal is to present the most frequent acoustic(More)
Our purpose is to design a useful tool which can be used in psychology to automatically classify utterances into five emotional states such as anger, happiness, neutral, sadness, and surprise. The major contribution of the paper is to rate the discriminating capability of a set of features for emotional speech recognition. A total of 87 features has been(More)
Several adaptive least mean squares (LMS) L-filters, both constrained and unconstrained ones, are developed for noise suppression in images and compared in this paper. First, the location-invariant LMS L-filter for a nonconstant signal corrupted by zero-mean additive white noise is derived. It is demonstrated that the location-invariant LMS L-filter can be(More)
In this paper, a novel audio-visual scene change detection algorithm is presented and evaluated experimentally. An enhanced set of eigen-audioframes is created that is related to an audio signal subspace, where audio background changes are easily discovered. An analysis is presented that justifies why this subspace favors scene change detection.(More)
Emotional speech recognition aims to automatically classify speech units (e.g., utterances) into emotional states, such as anger, happiness, neutral, sadness and surprise. The major contribution of this paper is to rate the discriminating capability of a set of features for emotional speech recognition when gender information is taken into consideration. A(More)
Two hybrid systems for classifying seven categories of human facial expression are proposed. The £rst system combines independent component analysis (ICA) and support vector machines (SVMs). The original face image database is decomposed into linear combinations of several basis images, where the corresponding coef£cients of these combinations are fed up(More)
Motivated by psychophysiological investigations on the human auditory system, a bio-inspired two-dimensional auditory representation of music signals is exploited, that captures the slow temporal modulations. Although each recording is represented by a second-order tensor (i.e., a matrix), a third-order tensor is needed to represent a music corpus.(More)
This survey focuses on two challenging speech processing topics, namely: speaker segmentation and speaker clustering. Speaker segmentation aims at finding speaker change points in an audio stream, whereas speaker clustering aims at grouping speech segments based on speaker characteristics. Model-based, metric-based, and hybrid speaker segmentation(More)