Gopal Ananthakrishnan

Learn More
Subtle temporal and spectral differences between categorical realizations of para-linguistic phenomena (e.g., affective vocal expressions) are hard to capture and describe. In this paper we present a signal representation based on Time Varying Constant-Q Cepstral Coefficients (TVCQCC) derived for this purpose. A method which utilizes the special properties(More)
This paper studies the hypothesis that the acoustic-to-articulatory mapping is non-unique, statistically. The distributions of the acoustic and articulatory spaces are obtained by fitting the data into a Gaussian Mixture Model. The kurtosis is used to measure the non-Gaussianity of the distributions and the Bhattacharya distance is used to find the(More)
This paper discusses a model which conceptually demonstrates how infants could learn the normalization between infant-adult acoustics. The model proposes that the mapping can be inferred from the topological correspondences between the adult and infant acoustic spaces, that are clustered separately in an unsu-pervised manner. The model requires feedback(More)
This paper explores the possibility and extent of non-uniqueness in the acoustic-to-articulatory inversion of speech, from a statistical point of view. It proposes a technique to estimate the non-uniqueness, based on finding peaks in the conditional probability function of the articulatory space. The paper corroborates the existence of non-uniqueness in a(More)
We propose a unified framework to recover articulation from audiovisual speech. The nonlinear audiovisual-to-articulatory mapping is modeled by means of a switching linear dynamical system. Switching is governed by a state sequence determined via a Hidden Markov Model alignment process. Mel Frequency Cepstral Coefficients are extracted from audio while(More)
This paper introduces a general approach for binary classification of audiovisual data. The intended application is mispronunciation detection for specific phonemic errors, using very sparse training data. The system uses a Support Vector Machine (SVM) classifier with features obtained from a Time Varying Discrete Cosine Transform (TV-DCT) on the audio(More)
This paper presents an Acoustic-to-Articulatory inversion method based on local regression. Two types of local regression , a non-parametric and a local linear regression have been applied on a corpus containing simultaneous recordings of positions of articulators and the corresponding acoustics. A maximum likelihood trajectory smoothing using the estimated(More)
We propose a method for Acoustic-to-Articulatory Inversion based on acoustic and articulatory 'gestures'. A definition for these gestures along with a method to segment the measured articulatory trajectories and the acoustic waveform into gestures is suggested. The gestures are parameterized by 2D DCT and 2D-cepstral coefficients respectively. The(More)
In order to study inter-speaker variability, this work aims to assess the generalization capabilities of data-based multi-speaker articulatory models. We use various three-mode factor analysis techniques to model the variations of midsagittal vocal tract contours obtained from MRI images for three French speakers articulating 73 vowels and consonants.(More)