• Publications
  • Influence
A supervised approach to movie emotion tracking
TLDR
A database of movie affect, annotated in continuous time, on a continuous valence-arousal scale is developed and supervised learning methods are proposed to model the continuous affective response using hidden Markov Models (independent) in each dimension.
Multimodal Saliency and Fusion for Movie Summarization Based on Aural, Visual, and Textual Attention
TLDR
Detecting of attention-invoking audiovisual segments is formulated in this work on the basis of saliency models for the audio, visual, and textual information conveyed in a video stream, forming the basis for a generic, bottom-up video summarization algorithm.
Video event detection and summarization using audio, visual and text saliency
TLDR
This multimodal saliency curve is the basis of a bottom-up video summarization algorithm, that refines results from unimodal or audiovisual-based skimming, that performs favorably for video summarizing in terms of informativeness and enjoyability.
Deeply Supervised Multimodal Attentional Translation Embeddings for Visual Relationship Detection
TLDR
A new deeply supervised two-branch architecture is introduced, the Multimodal Attentional Translation Embeddings, where the visual features of each branch are driven by a multimodal attentional mechanism that exploits spatio-linguistic similarities in a low-dimensional space.
COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization
TLDR
A multimodal video database, namely COGNIMUSE, annotated with sensory and semantic saliency, events, cross-media semantics, and emotion is presented and state-of-the-art algorithms are proposed for the detection of perceptually salient events from videos.
Musical instruments signal analysis and recognition using fractal features
TLDR
The multi-scale fractal dimension profile is proposed as a descriptor useful to quantify the multiscale complexity of the music waveform and experimentally found that this descriptor can discriminate several aspects among different music instruments.
A saliency-based approach to audio event detection and summarization
TLDR
This paper explores the potential of a modulation model for the detection of perceptually important audio events based on saliency models, along with various fusion schemes for their combination, including linear, adaptive and nonlinear methods.
Multiscale Fractal Analysis of Musical Instrument Signals With Application to Recognition
TLDR
TheMultiscale fractal dimension (MFD) profile is proposed as a short-time descriptor, useful to quantify the multiscale complexity and fragmentation of the different states of the music waveform, and can discriminate several aspects among different music instruments.
Audio salient event detection and summarization using audio and text modalities
TLDR
This paper takes a synergistic approach to audio summarization where saliency computation of audio streams is assisted by using the text modality as well, and creates summaries that consist not only of salient but also meaningful and semantically coherent events.
Exploring CNN-Based Architectures for Multimodal Salient Event Detection in Videos
TLDR
Comparisons over the COGNIMUSE database, consisting of movies and travel documentaries, provided strong evidence that the CNN-based approach for all modalities, even in this task, manages to outperform the hand-crafted frontend in almost all cases, accomplishing really good average results.
...
...