Athanasios Katsamanis

Learn More
We address the problem of tracking continuous levels of a participant’s activation, valence and dominance during the course of affective dyadic interactions, where participants may be speaking, listening or doing neither. To this end, we extract detailed and intuitive descriptions of each participant’s body movements, posture and behavior towards his(More)
While the accuracy of feature measurements heavily depends on changing environmental conditions, studying the consequences of this fact in pattern recognition tasks has received relatively little attention to date. In this paper, we explicitly take feature measurement uncertainty into account and show how multimodal classification and learning rules should(More)
We are interested in recovering aspects of vocal tract's geometry and dynamics from speech, a problem referred to as speech inversion. Traditional audio-only speech inversion techniques are inherently ill-posed since the same speech acoustics can be produced by multiple articulatory configurations. To alleviate the ill-posedness of the audio-only inversion(More)
Human emotional expression tends to evolve in a structured manner in the sense that certain emotional evolution patterns, i.e., anger to anger, are more probable than others, e.g., anger to happiness. Furthermore, the perception of an emotional display can be affected by recent emotional displays. Therefore, the emotional content of past and future(More)
Interaction synchrony among interlocutors happens naturally as people adapt their speaking style gradually to promote efficient communication. In this work, we quantify one aspect of interaction synchrony prosodic entrainment, specifically pitch and energy, in married couples’ problem-solving interactions using speech signal-derived measures. Statistical(More)
USC-TIMIT is an extensive database of multimodal speech production data, developed to complement existing resources available to the speech research community and with the intention of being continuously refined and augmented. The database currently includes real-time magnetic resonance imaging data from five male and five female speakers of American(More)
We present MRI-TIMIT: a large-scale database of synchronized audio and real-time magnetic resonance imaging (rtMRI) data for speech research. The database currently consists of speech data acquired from two male and two female speakers of American English. Subjects’ upper airways were imaged in the midsagittal plane while reading the same 460 sentence(More)
Human expressive interactions are characterized by an ongoing unfolding of verbal and nonverbal cues. Such cues convey the interlocutor's emotional state which is continuous and of variable intensity and clarity over time. In this paper, we examine the emotional content of body language cues describing a participant's posture, relative position and(More)
A method of rapid semi-automatic segmentation of real-time magnetic resonance image data for parametric analysis of vocal tract shaping is described. Tissue boundaries are identified by seeking pixel intensity thresholds along tract-normal gridlines. Airway contours are constrained with respect to a tract centerline defined as an optimal path over the graph(More)
Tongue Ultrasound imaging is widely used for human speech production analysis and modeling. In this paper, we propose a novel method to automatically detect and track the tongue contour in Ultrasound (US) videos. Our method is built on a variant of Active Appearance Modeling. It incorporates shape prior information and can estimate the entire tongue contour(More)