Learn More
The paper proposes a pitch detection algorithm based on the short-time average magnitude difference function (AMDF) and the short-term autocorrelation function (ACF). At first, AMDF values are computed by AMDF algorithm for a frame of speech signal. And then ACF values are computed by ACF algorithm for the AMDF values. In order to decreases computational(More)
This paper combines Gaussian mixture model-universal background model (GMM-UBM) and support vector machine (SVM) through post processing the GMM-UBM scores of different dimension feature parameter with SVM in speaker verification. Because different dimension feature makes different contribution to recognition performance and SVM has good discriminability,(More)
In this paper, we propose a novel image representation for scene classification. Firstly, we model multiple order statistics of image patches via Gaussian Mixture Model(GMM) in a Bayesian framework. Secondly, we combine the information of mean and covariance of the GMM and represent it as a mean-covariance supervector through a new distance metric.(More)
A novel modeling method for glottal source is proposed for improving the naturalness and quality of synthetic speech. This paper utilizes the high correlation between vocal tract parameters and glottal source to model glottal source. Vocal tract parameters (LSF) are clustered into some classes. Within each class, a LSF vector closest to centroid and its(More)
The acoustic mismatch between the training and test environments will lead to the difference of the statistical characteristics of speech parameters. Since the statistical characteristics of the kurtosis can measure the non-Gaussianity of a random variable, kurtosis normalization will make the training and test speech parameters match the standard normal(More)
A novel discriminative training method of Gaussian mixture model for text-independent speaker verification, Figure of Merit (FOM) training, is proposed in this paper. FOM training aims at maximizing the FOM of a ROC curve by adjusting the model parameters, rather than only approximating the underlying distribution of acoustic observations of each speaker(More)
In this paper we propose to merge speech and handwriting recognition hypotheses together for improving the performance of Chinese character input. The recognition result of handwriting character input can be reliable when the character is written rather squarely. However, more legible of square handwriting tends to slow down the input (stroke writing)(More)
Most conventional speaker recognition systems rely on short-term spectral information. But they ignore the long-term information such as prosody which also conveys speaker information. In this paper, we propose an approach that extracts prosodic features based on long-term information. First, by making wavelet analysis, we can reveal the trends of the f0(More)