Jintao Jiang

Learn More
Group analysis of structure or function in cerebral cortex typically involves, as a first step, the alignment of cortices. A surface-based approach to this problem treats the cortex as a convoluted surface and coregisters across subjects so that cortical landmarks or features are aligned. This registration can be performed using curves representing sulcal(More)
This study examines relationships between external face movements, tongue movements, and speech acoustics for consonantvowel (CV) syllables and sentences spoken by two male and two female talkers with different visual intelligibility ratings. The questions addressed are how relationships among measures vary by syllable, whether talkers who are more(More)
Previous studies [Lisker, J. Acoust. Soc. Am. 57, 1547-1551 (1975); Summerfield and Haggard, J. Acoust. Soc. Am. 62, 435-448 (1977)] have shown that voice onset time (VOT) and the onset frequency of the first formant are important perceptual cues of voicing in syllable-initial plosives. Most prior work, however, has focused on speech perception in quiet(More)
The talking face affords multiple types of information. To isolate cortical sites with responsibility for integrating linguistically relevant visual speech cues, speech and nonspeech face gestures were presented in natural video and point-light displays during fMRI scanning at 3.0T. Participants with normal hearing viewed the stimuli and also viewed(More)
A fundamental question about human perception is how the speech perceiving brain combines auditory and visual phonetic stimulus information. We assumed that perceivers learn the normal relationship between acoustic and optical signals. We hypothesized that when the normal relationship is perturbed by mismatching the acoustic and optical signals, cortical(More)
Much progress has been achieved during the past two decades in audio-visual automatic speech recognition (AVASR). However, challenges persist that hinder AVASR deployment in practical situations, most notably, robust and fast extraction of visual speech features. We review our efforts in overcoming this problem, based on an appearance-based visual feature(More)
This study is a first step in a large-scale study that aims at quantifying the relationship between external facial movements, tongue movements, and the acoustics of speech sounds. The database analyzed consisted of 69 CV syllables spoken by two males and two females; each utterance was repeated four times. A Qualysis (optical motion capture system) and an(More)
Dynamic Bayesian Networks (DBNs) have been widely studied in multi-modal speech recognition applications. Here, we introduce DBNs into an acoustically-driven talking face synthesis system. Three prototypes of DBNs, namely independent, coupled, and product HMMs were studied. Results showed that the DBN methods were more effective in this study than a(More)
This study was undertaken to examine relationships between the similarity structures of optical phonetic measures and visual phonetic perception. For this study, four talkers who varied in visual intelligibility were recorded simultaneously with a 3-dimensional optical recording system and a video camera. Subjects perceptually identified the talkers’(More)
Visual information in a speaker's face is known to improve the robustness of automatic speech recognition (ASR). However, most studies in audio-visual ASR have focused on "visually clean" data to benefit ASR in noise. This paper is a follow up on a previous study that investigated audio-visual ASR in visually challenging environments. It focuses on visual(More)