Georgios Evangelopoulos

Learn More
In this work we approach the analysis and segmentation of natural textured images by combining ideas from image analysis and probabilistic modeling. We rely on AM-FM texture models and specifically on the Dominant Component Analysis (DCA) paradigm for feature extraction. This method provides a low-dimensional, dense and smooth descriptor, capturing(More)
Multimodal streams of sensory information are naturally parsed and integrated by humans using signal-level feature extraction and higher-level cognitive processes. Detection of attention-invoking audiovisual segments is formulated in this work on the basis of saliency models for the audio, visual and textual information conveyed in a video stream. Aural or(More)
In this paper, we present experiments on continuous time, continuous scale affective movie content recognition (emotion tracking). A major obstacle for emotion research has been the lack of appropriately annotated databases, limiting the potential for supervised algorithms. To that end we develop and present a database of movie affect, annotated in(More)
Based on perceptual and computational attention modeling studies, we formulate measures of saliency for an audiovisual stream. Audio saliency is captured by signal modulations and related multi-frequency band features, extracted through nonlinear operators and energy tracking. Visual saliency is measured by means of a spatiotemporal attention model driven(More)
An audio-assisted system is investigated that detects if a movie scene is a dialogue or not. The system is based on actor indicator functions. That is, functions which define if an actor speaks at a certain time instant. In particular, the crosscorrelation and the magnitude of the corresponding the crosspower spectral density of a pair of indicator(More)
The ability to accurately locate the boundaries of speech activity is an important attribute of any modern speech recognition, processing, or transmission system. The effort in this paper is the development of efficient, sophisticated features for speech detection in noisy environments, using ideas and techniques from recent advances in speech modeling and(More)
The need for efficient, sophisticated features for speech event detection is inherent in state of the art processing, enhancement and recognition systems. We explore ideas and techniques from non-linear speech modeling and analysis, like modulations and multiband filtering and propose new energy and spectral content features derived through filtering in(More)
Texture modeling and separation of structure in images are treated in synergy. A variational image decomposition scheme is formulated using explicit texture reconstruction constraints from the outputs of linear filters tuned to different spatial frequencies and orientations. Relevant to the texture image part information is reconstructed using modulation(More)
Detection of perceptually important video events is formulated here on the basis of saliency models for the audio, visual and textual information conveyed in a video stream. Audio saliency is assessed by cues that quantify multifrequency waveform modulations, extracted through nonlinear operators and energy tracking. Visual saliency is measured through a(More)
We propose an alternative interpretation of Bayesian surprise in the spatial domain, to account for saliency arising from contrast in image context. Our saliency formulation is integrated in three different application scenaria, with considerable improvements in performance: 1) visual attention prediction, validated using eye- and mouse-tracking data, 2)(More)