Vaibhava Goel

Learn More
Recently it has been shown that policy-gradient methods for reinforcement learning can be utilized to train deep endto-end systems directly on non-differentiable metrics for the task at hand. In this paper we consider the problem of optimizing image captioning systems using reinforcement learning, and show that by carefully optimizing our systems using the(More)
In this paper, we study discriminative training of acoustic models for speech recognition under two criteria: maximum mutual information (MMI) and a novel "error-weighted" training technique. We present a proof that the standard MMI training technique is valid for a very general class of acoustic models with any kind of parameter tying. We report(More)
A standard approach to automatic speech recognition uses hidden Markov models whose state dependent distributions are Gaussian mixture models. Each Gaussian can be viewed as an exponential model whose features are linear and quadratic monomials in the acoustic vector. We consider here models in which the weight vectors of these exponential models are(More)
This paper investigates data augmentation for deep neural network acoustic modeling based on label-preserving transformations to deal with data sparsity. Two data augmentation approaches, vocal tract length perturbation (VTLP) and stochastic feature mapping (SFM), are investigated for both deep neural networks (DNNs) and convolutional neural networks(More)
The problem of speech decoding is considered here in a Decision Theoretic framework and a modified speech decoding procedure to minimize the expected risk under a general loss function is formulated. A specific word error rate loss function is considered and an implementation in an N-best list rescoring procedure is presented. Methods for estimation of the(More)
Minimum Bayes-risk (MBR) speech recognizers have been shown to yield improvements over the conventional maximum a-posteriori probability (MAP) decoders through N-best list rescoring and A/sup */ search over word lattices. We present a segmental minimum Bayes-risk decoding (SMBR) framework that simplifies the implementation of MBR recognizers through the(More)
Linear transforms are often used for adaptation to test data in speech recognition systems. However, when used with small amounts of test data, these techniques provide limited improvements if any. This paper proposes a two-step Bayesian approach where a) the transforms lie in a subspace obtained at training time and b) the expansion coefficients of the(More)
The SPAM model was recently proposed as a very general method for modeling Gaussians with constrained means and covariances. It has been shown to yield significant error rate improvements over other methods of constraining covariances such as diagonal, semi-tied covariances, and extended maximum likelihood linear transformations. In this paper we address(More)