Vikram Ramanarayanan

Learn More
USC-TIMIT is an extensive database of multimodal speech production data, developed to complement existing resources available to the speech research community and with the intention of being continuously refined and augmented. The database currently includes real-time magnetic resonance imaging data from five male and five female speakers of American(More)
We present MRI-TIMIT: a large-scale database of synchronized audio and real-time magnetic resonance imaging (rtMRI) data for speech research. The database currently consists of speech data acquired from two male and two female speakers of American English. Subjects’ upper airways were imaged in the midsagittal plane while reading the same 460 sentence(More)
We present a novel automatic procedure to analyze “articulatory setting (AS)” or “basis of articulation” using realtime magnetic resonance images (rt-MRI) of the human vocal tract recorded for read and spontaneously spoken speech. We extract relevant frames of inter-speech pauses (ISPs) and rest positions from MRI sequences of read and spontaneous speech(More)
This paper presents a computational approach to derive interpretable movement primitives from speech articulation data. It puts forth a convolutive Nonnegative Matrix Factorization algorithm with sparseness constraints (cNMFsc) to decompose a given data matrix into a set of spatiotemporal basis sequences and an activation matrix. The algorithm optimizes a(More)
We propose a practical, feature-level fusion approach for speaker verification using information from both acoustic and articulatory signals. We find that concatenating articulation features obtained from actual speech production data with conventional Mel-frequency cepstral coefficients (MFCCs) improves the overall speaker verification performance.(More)
We address the hypothesis that postures adopted during grammatical pauses in speech production are more "mechanically advantageous" than absolute rest positions for facilitating efficient postural motor control of vocal tract articulators. We quantify vocal tract posture corresponding to inter-speech pauses, absolute rest intervals as well as vowel and(More)
The large corpus of real time magnetic resonance image sequences of the vocal tract during speech production that was recently acquired and can be referred to as MRI-TIMIT, provides us with a unique platform for systematically studying articulatory dynamics. Compared to previously collected articulatory datasets, e.g., using articulography or X-rays,(More)
This paper presents an automatic procedure to analyze articulatory setting in speech production using real-time magnetic resonance imaging of the moving human vocal tract. The procedure extracts frames corresponding to inter-speech pauses, speech-ready intervals and absolute rest intervals from magnetic resonance imaging sequences of read and spontaneous(More)
It is hypothesized that pauses at major syntactic boundaries (i.e., grammatical pauses), but not ungrammatical (e.g., word search) pauses, are planned by a high-level cognitive mechanism that also controls the rate of articulation around these junctures. Real-time magnetic resonance imaging is used to analyze articulation at and around grammatical and(More)
We present a method for speech enhancement of data collected in extremely noisy environments, such as those found during magnetic resonance imaging (MRI) scans. We propose a twostep algorithm to perform this noise suppression. First, we use probabilistic latent component analysis to learn dictionaries of the noise and speech+noise portions of the data and(More)