Learn More
This study examines relationships between external face movements, tongue movements, and speech acoustics for consonant-vowel (CV) syllables and sentences spoken by two male and two female talkers with different visual intelligibility ratings. The questions addressed are how relationships among measures vary by syllable, whether talkers who are more(More)
This study is a first step in a large-scale study that aims at quantifying the relationship between external facial movements, tongue movements, and the acoustics of speech sounds. The database analyzed consisted of 69 CV syllables spoken by two males and two females; each utterance was repeated four times. A Qualysis (optical motion capture system) and an(More)
The cortical processing of auditory-alone, visual-alone, and audiovisual speech information is temporally and spatially distributed, and functional magnetic resonance imaging (fMRI) cannot adequately resolve its temporal dynamics. In order to investigate a hypothesized spatiotemporal organization for audiovisual speech processing circuits, event-related(More)
The long term goal of our work is to predict visual confusion matrices from physical measurements. In this paper, four talkers were chosen to record 69 American-English Consonant-Vowel syllables with audio, video, and facial movements captured. During the recording, 20 markers were put on the face and an optical Qualisys system was used to track(More)
This paper investigates the relationship between visual confusion matrices and physical (facial) measures. The similarity structure in perceptual and physical measures for visual consonants was examined across four talkers. Four talkers, spanning a wide range of rated visual intelligibility, were recorded producing 69 Consonant-Vowel (CV) syllables. Audio,(More)
This paper introduces a new dynamical model which enhances the relationship between face movements and speech acoustics. Based on the autocorrelation of the acoustics and of the face movements, a causal and a non-causal filter are proposed to approximate dynamical features in the speech signals. The database consisted of sentences recorded acoustically, and(More)