Carol Y. Espy-Wilson

Learn More
We present a method to boost the performance of probabilistic generative models that work with i-vector representations. The proposed approach deals with the non-Gaussian behavior of i-vectors by performing a simple length normalization. This non-linear transformation allows the use of probabilistic models with Gaussian assumptions that yield equivalent(More)
Many different studies have claimed that articulatory information can be used to improve the performance of automatic speech recognition systems. Unfortunately, such articulatory information is not readily available in typical speaker-listener situations. Consequently, such information has to be estimated from the acoustic signal in a process which is(More)
In this paper we present a study on the automatic identification of acquisition devices when only access to the output speech recordings is possible. A statistical characterization of the frequency response of the device contextualized by the speech content is proposed. In particular, the intrinsic characteristics of the device are captured by a template,(More)
Recent advances in physiological data collection methods have made it possible to test the accuracy of predictions against speaker-specific vocal tracts and acoustic patterns. Vocal tract dimensions for /r/ derived via magnetic-resonance imaging (MRI) for two speakers of American English [Alwan, Narayanan, and Haker, J. Acoust. Soc. Am. 101, 1078-1089(More)
We propose a method that combines acoustic-phonetic knowledge with support vector machines for segmentation of continuous speech into five classes-vowel, sonorant consonant, fricative, stop and silence. We show that by using a probabilistic phonetic feature hierarchy, only four classifiers are required to recognize the five classes. Due to the probabilistic(More)
Introduction The goal of our research is to develop a gesture and landmark-based speech recognition system. This work presents the initial step to achieve such a system, where the mapping between the speech signal and the vocal tract time functions (VTTF) is considered.  VTTFs are time-varying physical realizations of articulatory gestures at distinct(More)
The American English phoneme /r/ has long been associated with large amounts of articulatory variability during production. This paper investigates the hypothesis that the articulatory variations used by a speaker to produce /r/ in different contexts exhibit systematic tradeoffs, or articulatory trading relations, that act to maintain a relatively stable(More)
We present a multicondition training strategy for Gaussian Probabilistic Linear Discriminant Analysis (PLDA) modeling of i-vector representations of speech utterances. The proposed approach uses a multicondition set to train a collection of individual subsystems that are tuned to specific conditions. A final verification score is obtained by combining the(More)
Of all the sounds in any language, nasals are the only class of sounds with dominant speech output from the nasal cavity as opposed to the oral cavity. This gives nasals some special properties including presence of zeros in the spectrum, concentration of energy at lower frequencies, higher formant density, higher losses, and stability. In this paper we(More)
A probabilistic framework for a landmark-based approach to speech recognition is presented for obtaining multiple landmark sequences in continuous speech. The landmark detection module uses as input acoustic parameters (APs) that capture the acoustic correlates of some of the manner-based phonetic features. The landmarks include stop bursts, vowel onsets,(More)