Gopal Ananthakrishnan

Learn More
Subtle temporal and spectral differences between categorical realizations of para-linguistic phenomena (e.g., affective vocal expressions) are hard to capture and describe. In this paper we present a signal representation based on Time Varying Constant-Q Cepstral Coefficients (TVCQCC) derived for this purpose. A method which utilizes the special properties(More)
In order to study inter-speaker variability, this work aims to assess the generalization capabilities of data-based multi-speaker articulatory models. We use various three-mode factor analysis techniques to model the variations of midsagittal vocal tract contours obtained from MRI images for three French speakers articulating 73 vowels and consonants.(More)
This paper studies the hypothesis that the acoustic-to-articulatory mapping is non-unique, statistically. The distributions of the acoustic and articulatory spaces are obtained by fitting the data into a Gaussian Mixture Model. The kurtosis is used to measure the non-Gaussianity of the distributions and the Bhattacharya distance is used to find the(More)
This paper discusses a model which conceptually demonstrates how infants could learn the normalization between infant-adult acoustics. The model proposes that the mapping can be inferred from the topological correspondences between the adult and infant acoustic spaces, that are clustered separately in an unsu-pervised manner. The model requires feedback(More)
This paper explores the possibility and extent of non-uniqueness in the acoustic-to-articulatory inversion of speech, from a statistical point of view. It proposes a technique to estimate the non-uniqueness, based on finding peaks in the conditional probability function of the articulatory space. The paper corroborates the existence of non-uniqueness in a(More)
This paper introduces a general approach for binary classification of audiovisual data. The intended application is mispronunciation detection for specific phonemic errors, using very sparse training data. The system uses a Support Vector Machine (SVM) classifier with features obtained from a Time Varying Discrete Cosine Transform (TV-DCT) on the audio(More)
We propose a unified framework to recover articulation from audiovisual speech. The nonlinear audiovisual-to-articulatory mapping is modeled by means of a switching linear dynamical system. Switching is governed by a state sequence determined via a Hidden Markov Model alignment process. Mel Frequency Cepstral Coefficients are extracted from audio while(More)
This study aims at automatically classifying levels of acoustic prominence on a dataset of 200 Swedish sentences of read speech by one male native speaker. Each word in the sentences was categorized by four speech experts into one of three groups depending on the level of prominence perceived. Six acoustic features at a syllable level and seven features at(More)
This paper presents an Acoustic-to-Articulatory inversion method based on local regression. Two types of local regression , a non-parametric and a local linear regression have been applied on a corpus containing simultaneous recordings of positions of articulators and the corresponding acoustics. A maximum likelihood trajectory smoothing using the estimated(More)
We propose a method for Acoustic-to-Articulatory Inversion based on acoustic and articulatory 'gestures'. A definition for these gestures along with a method to segment the measured articulatory trajectories and the acoustic waveform into gestures is suggested. The gestures are parameterized by 2D DCT and 2D-cepstral coefficients respectively. The(More)