K. Sri Rama Murty

Epoch is the instant of significant excitation of the vocal-tract system during production of speech. For most voiced speech, the most significant excitation takes place around the instant of glottal closure. Extraction of epochs from speech is a challenging task due to time-varying characteristics of the source and the system. Most epoch extraction methods(More)
The objective of this letter is to demonstrate the complementary nature of speaker-specific information present in the residual phase in comparison with the information present in the conventional mel-frequency cepstral coefficients (MFCCs). The residual phase is derived from speech signal by linear prediction analysis. Speaker recognition studies are(More)
The objective of this work is to characterize certain important features of excitation of speech, namely, detecting the regions of glottal activity and estimating the strength of excitation in each glottal cycle. The proposed method is based on the assumption that the excitation to the vocal-tract system can be approximated by a sequence of impulses of(More)
Exploiting the impulse-like nature of excitation in the sequence of glottal cycles, a method is proposed to derive the instantaneous fundamental frequency from speech signals. The method involves passing the speech signal through two ideal resonators located at zero frequency. A filtered signal is derived from the output of the resonators by subtracting the(More)
In this paper, we propose an approach for processing multispeaker speech signals collected simultaneously using a pair of spatially separated microphones in a real room environment. Spatial separation of microphones results in a fixed time-delay of arrival of speech signals from a given speaker at the pair of microphones. These time-delays are estimated by(More)
This paper proposes a method for detection of voiced regions from speech signals collected in noisy environment. The proposed method is based on the characteristics of excitation source of speech production. The degraded speech signal is processed by linear prediction analysis for deriving the linear prediction residual. Hilbert envelope of the linear(More)
In this letter, we address the issue of determining the number of speakers from multispeaker speech signals collected simultaneously using a pair of spatially separated microphones. The spatial separation of the microphones results in time delay of arrival of speech signals from a given speaker. The differences in the time delays for different speakers are(More)
The objective of this paper is to demonstrate the effectiveness of sparse representation techniques for speaker recognition. In this approach, each feature vector from unknown utterance is expressed as linear weighted sum of a dictionary of feature vectors belonging to many speakers. The weights associated with feature vectors in the dictionary are(More)