Learn More
USC-TIMIT is an extensive database of multimodal speech production data, developed to complement existing resources available to the speech research community and with the intention of being continuously refined and augmented. The database currently includes real-time magnetic resonance imaging data from five male and five female speakers of American(More)
We present MRI-TIMIT: a large-scale database of synchronized audio and real-time magnetic resonance imaging (rtMRI) data for speech research. The database currently consists of speech data acquired from two male and two female speakers of Amer-ican English. Subjects' upper airways were imaged in the mid-sagittal plane while reading the same 460 sentence(More)
We present a novel automatic procedure to analyze " articulatory setting (AS) " or " basis of articulation " using real-time magnetic resonance images (rt-MRI) of the human vocal tract recorded for read and spontaneously spoken speech. We extract relevant frames of inter-speech pauses (ISPs) and rest positions from MRI sequences of read and spontaneous(More)
This paper presents a computational approach to derive interpretable movement primitives from speech articulation data. It puts forth a convolutive Nonnegative Matrix Factorization algorithm with sparseness constraints (cNMFsc) to decompose a given data matrix into a set of spatiotemporal basis sequences and an activation matrix. The algorithm optimizes a(More)
This paper presents an automatic procedure to analyze articulatory setting in speech production using real-time magnetic resonance imaging of the moving human vocal tract. The procedure extracts frames corresponding to inter-speech pauses, speech-ready intervals and absolute rest intervals from magnetic resonance imaging sequences of read and spontaneous(More)
It is hypothesized that pauses at major syntactic boundaries (i.e., grammatical pauses), but not ungrammatical (e.g., word search) pauses, are planned by a high-level cognitive mechanism that also controls the rate of articulation around these junctures. Real-time magnetic resonance imaging is used to analyze articulation at and around grammatical and(More)
We present a method for speech enhancement of data collected in extremely noisy environments, such as those found during magnetic resonance imaging (MRI) scans. We propose a two-step algorithm to perform this noise suppression. First, we use probabilistic latent component analysis to learn dictionaries of the noise and speech+noise portions of the data and(More)
We present a procedure to automatically derive inter-pretable dynamic articulatory primitives in a data-driven manner from image sequences acquired through real-time magnetic resonance imaging (rt-MRI). More specifically, we propose a convolutive Nonnegative Matrix Factorization algorithm with sparseness constraints (cNMFsc) to decompose a given set of(More)
We propose a practical, feature-level fusion approach for speaker verification using information from both acoustic and articulatory signals. We find that concatenating articulation features obtained from actual speech production data with conventional Mel-frequency cepstral coefficients (MFCCs) improves the overall speaker verification performance. However(More)
Real-Time Magnetic Resonance Imaging affords speech articu-lation data with good spatial and temporal resolution and complete midsagittal views of the moving vocal tract, but also brings many challenges in the domain of image processing and analysis. Region-of-interest analysis has previously been proposed for simple, efficient and robust extraction of(More)