Hemant A. Patil

Learn More
Speech synthesis and voice conversion techniques can pose threats to current speaker verification (SV) systems. For this purpose, it is essential to develop front end systems that are able to distinguish human speech vs. spoofed speech (synthesized or voice converted). In this paper, for the ASVspoof 2015 challenge, we propose a detector based on(More)
In this paper, we discuss a consortium effort on building text to speech (TTS) systems for 13 Indian languages. There are about 1652 Indian languages. A unified framework is therefore attempted required for building TTSes for Indian languages. As Indian languages are syllable-timed, a syllablebased framework is developed. As quality of speech synthesis is(More)
Most of the speaker recognition systems use system features for speaker recognition which are mostly spectral in nature. Recently, there has been significant work on using source features, viz., prosodies and pitch dynamics, glottal flow derivative, Linear Prediction (LP) residual and its phase, wavelet-domain representation of LP residual, etc for speaker(More)
Most of the state-of-the-art voice biometrics systems use the natural speech signal (either read speech or spontaneous or contextual speech) from the subjects. In this paper, an attempt is made to identify speakers from their hum. A new feature set, viz., Variable length Teager Energy Based Mel Frequency Cepstral Coefficients (VTMFCC) is proposed for this(More)
Teager Energy Operator (TEO) proposed by Kaiser and Teager is based on a definition of energy required to generate the signal. TEO gives us the running estimate of energy as a function of amplitude and instantaneous frequency content of the signal. However, it considers three consecutive samples to calculate the energy estimate. In this paper, we suggests(More)
In this paper, use of Viterbi-based algorithm and spectral transition measure (STM)-based algorithm for the task of speech data labeling is being attempted. In the STM framework, we propose use of several spectral features such as recently proposed cochlear filter cepstral coefficients (CFCC), perceptual linear prediction cepstral coefficients (PLPCC) and(More)
Several speech synthesis and voice conversion techniques can easily generate or manipulate speech to deceive the speaker verification (SV) systems. Hence, there is a need to develop spoofing countermeasures to detect the human speech from spoofed speech. System-based features have been known to contribute significantly to this task. In this paper, we extend(More)