Learn More
Pathological speech usually refers to the condition of speech distortion resulting from atypicalities in voice and/or in the ar-ticulatory mechanisms owing to disease, illness or other physical or biological insult to the production system. While automatic evaluation of speech intelligibility and quality could come in handy in these scenarios to assist in(More)
USC-TIMIT is an extensive database of multimodal speech production data, developed to complement existing resources available to the speech research community and with the intention of being continuously refined and augmented. The database currently includes real-time magnetic resonance imaging data from five male and five female speakers of American(More)
Vocal tract area function estimation from three-dimensional (3D) volumetric dataset often involves complex and manual procedures such as oblique slice cutting and image segmentation. We introduce a semi-automatic method for estimating vocal tract area function from 3D Magnetic Resonance Imaging (MRI) datasets. The method was implemented on a custom MATLAB(More)
Acoustic and articulatory behaviors underlying emotion strength perception are studied by analyzing acted emotional speech. Listeners evaluated emotion identity, strength and confidence. Parameters related to pitch, loudness and articulatory kinematics are associated with a 2-level (strong/weak) representation of the emotion strength. Two-class discriminant(More)
This paper investigates the interplay between articulatory movement and voice source activity as a function of emotions in speech production. Our hypothesis is that humans use different modulation methods in which articulatory movements and prosodic modulations are differently weighted across different emotions. This hypothesis was examined by joint(More)
Pathological speech usually refers to the condition of speech distortion resulting from atypicalities in voice and/or in the articulatory mechanisms owing to disease, illness or other physical or biological insult to the production system. Although automatic evaluation of speech intelligibility and quality could come in handy in these scenarios to assist(More)
Automatically evaluating pronunciation quality of non-native speech has seen tremendous success in both research and commercial settings, with applications in L2 learning. In this paper, submitted for the INTERSPEECH 2015 Degree of Nativeness Sub-Challenge, this problem is posed under a challenging cross-corpora setting using speech data drawn from multiple(More)
This study explores manifold representations of emotionally modulated speech. The manifolds are derived in the articulatory space and two acoustic spaces (MFB and MFCC) using isometric feature mapping (Isomap) with data from an emotional speech corpus. Their effectiveness in representing emotional speech is tested based on the emotion classification(More)
We propose a practical, feature-level fusion approach for speaker verification using information from both acoustic and articulatory signals. We find that concatenating articulation features obtained from actual speech production data with conventional Mel-frequency cepstral coefficients (MFCCs) improves the overall speaker verification performance. However(More)