Learn More
USC-TIMIT is an extensive database of multimodal speech production data, developed to complement existing resources available to the speech research community and with the intention of being continuously refined and augmented. The database currently includes real-time magnetic resonance imaging data from five male and five female speakers of American(More)
Pathological speech usually refers to the condition of speech distortion resulting from atypicalities in voice and/or in the articulatory mechanisms owing to disease, illness or other physical or biological insult to the production system. While automatic evaluation of speech intelligibility and quality could come in handy in these scenarios to assist in(More)
Thermal therapy is one of the most popular physiotherapies and it is particularly useful for treating joint injuries. Conventional devices adapted for thermal therapy including heat packs and wraps have often caused discomfort to their wearers because of their rigidity and heavy weight. In our study, we developed a soft, thin, and stretchable heater by(More)
Pathological speech usually refers to the condition of speech distortion resulting from atypicalities in voice and/or in the articulatory mechanisms owing to disease, illness or other physical or biological insult to the production system. Although automatic evaluation of speech intelligibility and quality could come in handy in these scenarios to assist(More)
We propose a practical, feature-level fusion approach for speaker verification using information from both acoustic and articulatory signals. We find that concatenating articulation features obtained from actual speech production data with conventional Mel-frequency cepstral coefficients (MFCCs) improves the overall speaker verification performance.(More)
Vocal tract area function estimation from three-dimensional (3D) volumetric dataset often involves complex and manual procedures such as oblique slice cutting and image segmentation. We introduce a semi-automatic method for estimating vocal tract area function from 3D Magnetic Resonance Imaging (MRI) datasets. The method was implemented on a custom MATLAB(More)
This paper introduces an algorithm for robust segmentation of airway-tissue boundaries in the upper airway images recorded by the real-time magnetic resonance imaging. Compared to the previous method by Proctor et al. [1], the present algorithm performs image quality enhancement, including pixel sensitivity correction and grainy noise reduction, followed by(More)
This study explores manifold representations of emotionally modulated speech. The manifolds are derived in the articulatory space and two acoustic spaces (MFB and MFCC) using isometric feature mapping (Isomap) with data from an emotional speech corpus. Their effectiveness in representing emotional speech is tested based on the emotion classification(More)
Acoustic and articulatory behaviors underlying emotion strength perception are studied by analyzing acted emotional speech. Listeners evaluated emotion identity, strength and confidence. Parameters related to pitch, loudness and articulatory kinematics are associated with a 2-level (strong/weak) representation of the emotion strength. Two-class discriminant(More)
The goal in this work is to automatically classify speakers’ level of cognitive load (low, medium, high) from a standard battery of reading tasks requiring varying levels of working memory. This is a challenging machine learning problem because of the inherent difficulty in defining/measuring cognitive load and due to intra-/inter-speaker differences in how(More)