Learn More
This study examines relationships between external face movements, tongue movements, and speech acoustics for consonant-vowel (CV) syllables and sentences spoken by two male and two female talkers with different visual intelligibility ratings. The questions addressed are how relationships among measures vary by syllable, whether talkers who are more(More)
A fundamental question about human perception is how the speech perceiving brain combines auditory and visual phonetic stimulus information. We assumed that perceivers learn the normal relationship between acoustic and optical signals. We hypothesized that when the normal relationship is perturbed by mismatching the acoustic and optical signals, cortical(More)
This study is a first step in a large-scale study that aims at quantifying the relationship between external facial movements, tongue movements, and the acoustics of speech sounds. The database analyzed consisted of 69 CV syllables spoken by two males and two females; each utterance was repeated four times. A Qualysis (optical motion capture system) and an(More)
Previous studies [Lisker, J. Acoust. Soc. Am. 57, 1547-1551 (1975); Summerfield and Haggard, J. Acoust. Soc. Am. 62, 435-448 (1977)] have shown that voice onset time (VOT) and the onset frequency of the first formant are important perceptual cues of voicing in syllable-initial plosives. Most prior work, however, has focused on speech perception in quiet(More)
The talking face affords multiple types of information. To isolate cortical sites with responsibility for integrating linguistically relevant visual speech cues, speech and nonspeech face gestures were presented in natural video and point-light displays during fMRI scanning at 3.0T. Participants with normal hearing viewed the stimuli and also viewed(More)
This study is a first step in selecting an appropriate subword unit representation to synthesize highly intelligible 3D talking faces. Consonant confusions were obtained with optic features from a 320-sentence database, spoken by a male talker, using Gaussian mixture models and maximum a posteriori classification methods. The results were compared to(More)
Group analysis of structure or function in cerebral cortex typically involves, as a first step, the alignment of cortices. A surface-based approach to this problem treats the cortex as a convoluted surface and coregisters across subjects so that cortical landmarks or features are aligned. This registration can be performed using curves representing sulcal(More)
Visual information in a speaker's face is known to improve robustness of automatic speech recognizers. However, most studies in audiovisual ASR have focused on " visually clean " data to benefit ASR in noise. This paper is a follow up on a previous study that investigated audiovisual ASR in visually challenging environments. It focuses on visual speech(More)
Dynamic Bayesian Networks (DBNs) have been widely studied in multi-modal speech recognition applications. Here, we introduce DBNs into an acoustically-driven talking face synthesis system. Three prototypes of DBNs, namely independent, coupled, and product HMMs were studied. Results showed that the DBN methods were more effective in this study than a(More)