Learn More
Most of the current state-of-the-art speech recognition systems are based on speech signal parametrizations that crudely model the behavior of the human auditory system. However, little or no use is usually made of the knowledge on the human speech production system. A data-driven statistical approach to incorporate this knowledge into ASR would require a(More)
In this paper, we describe automatic speech recognition system where features extracted from human speech production system in form of articulatory movements data are effectively integrated in the acoustic model for improved recognition performance. The system is based on the hybrid HMM/BN model, which allows for easy integration of different speech(More)
In this paper, we describe the ATR multilingual speech-to-speech translation (S2ST) system, which is mainly focused on translation between English and Asian languages (Japanese and Chinese). There are three main modules of our S2ST system: large-vocabulary continuous speech recognition, machine text-to-text (T2T) translation, and text-to-speech synthesis.(More)
When the reference speakers are represented by Gaussian mixture model (GMM), the conventional approach is to accumulate the frame likelihoods over the whole test utterance and compare the results as in speaker identi®cation or apply a threshold as in speaker veri®cation. In this paper we describe a method, where frame likelihoods are transformed into new(More)
SUMMARY In current HMM based speech recognition systems , it is difficult to supplement acoustic spectrum features with additional information such as pitch, gender, articulator positions , etc. On the other hand, Bayesian Networks (BN) allow for easy combination of different continuous as well as discrete features by exploring conditional dependencies(More)
In this paper, we describe new high-performance on-line speaker diarization system which works faster than real-time and has very low latency. It consists of several modules including voice activity detection, novel speaker detection, speaker gender and speaker identity classi¿cation. All modules share a set of Gaussian mixture models (GMM) representing(More)
It is difficult to recognize speech distorted by various factors, especially when an ASR system contains only a single acoustic model. One solution is to use multiple acoustic models, one model for each different condition. In this paper, we discuss a parallel decoding-based ASR system that is robust to the noise type, SNR, speaker gender and speaking(More)
Most of the current state-of-the-art speech recognition systems use the Hidden Markov Model (HMM) for modeling acoustical characteristics of a speech signal. In the first-order HMM, speech data are assumed to be independently and identically distributed (i.i.d.), meaning that there is no dependency between neighboring feature vectors. Another assumption is(More)
Current automatic speech recognition systems have two distinctive modes of operation: training and recognition. After the training, system parameters are fixed, and if a mismatch between training and testing conditions occurs, an adaptation procedure is commonly applied. However, the adaptation methods change the system parameters in such a way that(More)
In this paper we propose a new speaker identication system, where the likelihood normalization technique, widely used for speaker verication, is introduced. In the new system, which is based on Gaussian Mixture Models, every frame of the test utterance is inputed to all the reference models in parallel. In this procedure, for each frame, likelihoods from(More)