Learn More
Linear transforms are often used for adaptation to test data in speech recognition systems. However, when used with small amounts of test data, these techniques provide limited improvements if any. This paper proposes a two-step Bayesian approach where a) the transforms lie in a subspace obtained at training time and b) the expansion coefficients of the(More)
— A standard approach to automatic speech recognition uses Hidden Markov Models whose state dependent distributions are Gaussian mixture models. Each Gaussian can be viewed as an exponential model whose features are linear and quadratic monomials in the acoustic vector. We consider here models in which the weight vectors of these exponential models are(More)
— In this paper we study discriminative training of acoustic models for speech recognition under two criteria: maximum mutual information (MMI) and a novel " error weighted " training technique. We present a proof that the standard MMI training technique is valid for a very general class of acoustic models with any kind of parameter tying. We report(More)
This paper investigates data augmentation for deep neural network acoustic modeling based on label-preserving transformations to deal with data sparsity. Two data augmentation approaches, vocal tract length perturbation (VTLP) and stochastic feature mapping (SFM), are investigated for both deep neural networks (DNNs) and convolutional neural networks(More)
This paper applies the recently proposed SPAM models for acoustic modeling in a Speaker Adaptive Training (SAT) context on large vocabulary conversational speech databases, including the Switchboard database. SPAM models are Gaus-sian mixture models in which a subspace constraint is placed on the precision and mean matrices (although this paper fo-cuses on(More)
Minimum Bayes Risk (MBR) speech recognizers have been shown to yield improvements over the conventional maximum a-posteriori probability (MAP) decoders in the context of N-best list rescoring and A search over recognition lattices. Seg-mental MBR (SMBR) procedures have been developed to simplify implementation of MBR recognizers, by segmenting the N-best(More)
This paper applies the recently proposed Extended Maximum Likelihood Linear Transformation (EMLLT) model in a Speaker Adaptive Training (SAT) context on the Switchboard database. Adaptation is carried out with maximum likelihood estimation of linear transforms for the means, precisions (inverse covariances) and the feature-space under the EMLLT model. This(More)
Active learning is a strategy to minimize the annotation effort required to train statistical models, such as a statistical clas-sifier used for natural language call routing or user intent classification. Most variants of active learning are " certainty-based; " they typically select, for human labeling, samples that are most likely to be mis-classified by(More)
State-of-the-art convolutional neural networks (CNNs) typically use a log-mel spectral representation of the speech signal. However, this representation is limited by the spectro-temporal resolution afforded by log-mel filter-banks. A novel technique known as Deep Scattering Spectrum (DSS) addresses this limitation and preserves higher resolution(More)