Learn More
Most of the current state-of-the-art speech recognition systems are based on speech signal parametrizations that crudely model the behavior of the human auditory system. However, little or no use is usually made of the knowledge on the human speech production system. A data-driven statistical approach to incorporate this knowledge into ASR would require a(More)
In this paper, we describe automatic speech recognition system where features extracted from human speech production system in form of articulatory movements data are effectively integrated in the acoustic model for improved recognition performance. The system is based on the hybrid HMM/BN model, which allows for easy integration of different speech(More)
When the reference speakers are represented by Gaussian mixture model (GMM), the conventional approach is to accumulate the frame likelihoods over the whole test utterance and compare the results as in speaker identi®cation or apply a threshold as in speaker veri®cation. In this paper we describe a method, where frame likelihoods are transformed into new(More)
In this paper, we describe new high-performance on-line speaker diarization system which works faster than real-time and has very low latency. It consists of several modules including voice activity detection, novel speaker detection, speaker gender and speaker identity classi¿cation. All modules share a set of Gaussian mixture models (GMM) representing(More)
It is difficult to recognize speech distorted by various factors, especially when an ASR system contains only a single acoustic model. One solution is to use multiple acoustic models, one model for each different condition. In this paper, we discuss a parallel decoding-based ASR system that is robust to the noise type, SNR, speaker gender and speaking(More)
Most of the current state-of-the-art speech recognition systems use the Hidden Markov Model (HMM) for modeling acoustical characteristics of a speech signal. In the first-order HMM, speech data are assumed to be independently and identically distributed (i.i.d.), meaning that there is no dependency between neighboring feature vectors. Another assumption is(More)
Current automatic speech recognition systems have two distinctive modes of operation: training and recognition. After the training, system parameters are fixed, and if a mismatch between training and testing conditions occurs, an adaptation procedure is commonly applied. However, the adaptation methods change the system parameters in such a way that(More)
In current HMM based speech recognition systems, it is difficult to supplement acoustic spectrum features with additional information such as pitch, gender, articulator positions , etc. On the other hand, Dynamic Bayesian Networks (DBN) allow for easy combination of different features and make use of conditional dependencies between them. However , lack of(More)
Availability of large amounts of raw unlabeled data has sparked the recent surge in semi-supervised learning research. In most works, however, it is assumed that labeled and unlabeled data come from the same distribution. This restriction is removed in the self-taught learning algorithm where unlabeled data can be different, but nevertheless have similar(More)