Learn More
The Kullback Leibler (KL) divergence is a widely used tool in statistics and pattern recognition. The KL divergence between two Gaussian mixture models (GMMs) is frequently needed in the fields of speech and image recognition. Unfortunately the KL divergence between two GMMs is not analytically tractable, nor does any efficient computational algorithm(More)
We present a system that can separate and recognize the simultaneous speech of two people recorded in a single channel. Applied to the monaural speech separation and recognition challenge, the system out-performed all other participants – including human listeners – with an overall recognition error rate of 21.6%, compared to the human error rate of 22.3%.(More)
We describe a system for model based speech separation which achieves superhuman recognition performance when two talkers speak at similar levels. The system can separate the speech of two speakers from a single channel recording with remarkable results. It incorporates a novel method for performing two-talker speaker identification and gain estimation. We(More)
This paper proposes a new covariance modeling technique for Gaussian mixture models. Specifically the inverse covariance (precision) matrix of each Gaussian is expanded in a rank-1 basis i.e., /spl Sigma//sub j//sup -1/=P/sub j/=/spl Sigma//sub k=1//sup D//spl lambda//sub k//sup j/a/sub k/a/sub k//sup T/, /spl lambda//sub k//sup j//spl isin//spl Ropf/,a/sub(More)
We have described some of the problems with modeling mixed acoustic signals in the log spectral domain using graphical models, as well as some current approaches to handling these problems for multitalker speech separation and recognition. We have also reviewed methods for inference on FHMMs (factorial hidden Markov model) and methods for handling the(More)
We consider a family of Gaussian mixture models for use in HMM based speech recognition system. These " SPAM " models have state independent choices of subspaces to which the precision (inverse covariance) matrices and means are restricted to belong. They provide a flexible tool for robust, compact, and fast acoustic modeling. The focus of this paper is on(More)
A standard approach to automatic speech recognition uses hidden Markov models whose state dependent distributions are Gaussian mixture models. Each Gaussian can be viewed as an exponential model whose features are linear and quadratic monomials in the acoustic vector. We consider here models in which the weight vectors of these exponential models are(More)
In this paper, we study discriminative training of acoustic models for speech recognition under two criteria: maximum mutual information (MMI) and a novel "error-weighted" training technique. We present a proof that the standard MMI training technique is valid for a very general class of acoustic models with any kind of parameter tying. We report(More)