Learn More
Multilingual deep neural networks (DNNs) can act as deep feature extractors and have been applied successfully to cross-language acoustic modeling. Learning these feature extractors becomes an expensive task, because of the enlarged multilingual training data and the sequential nature of stochastic gradient descent (SGD). This paper investigates strategies(More)
The performance of automatic speech recognition (ASR) has improved tremendously due to the application of deep neu-ral networks (DNNs). Despite this progress, building a new ASR system remains a challenging task, requiring various resources, multiple training stages and significant expertise. This paper presents our Eesen framework which drastically(More)
In this work, a novel training scheme for generating bottleneck features from deep neural networks is proposed. A stack of denois-ing auto-encoders is first trained in a layer-wise, unsupervised manner. Afterwards, the bottleneck layer and an additional layer are added and the whole network is fine-tuned to predict target phoneme states. We perform(More)
As a feed-forward architecture, the recently proposed maxout networks integrate dropout naturally and show state-of-the-art results on various computer vision datasets. This paper investigates the application of deep maxout networks (DMNs) to large vocabulary continuous speech recognition (LVCSR) tasks. Our focus is on the particular advantage of DMNs under(More)
We investigate two strategies to improve the context-dependent deep neural network hidden Markov model (CD-DNN-HMM) in low-resource speech recognition. Although outperforming the conventional Gaussian mixture model (GMM) HMM on various tasks, CD-DNN-HMM acoustic modeling becomes challenging with limited transcribed speech, e.g., less than 10 hours. To(More)
We investigate the concept of speaker adaptive training (SAT) in the context of deep neural network (DNN) acoustic models. Previous studies have shown success of performing speaker adaptation for DNNs in speech recognition. In this paper, we apply SAT to DNNs by learning two types of feature mapping neural networks. Given an initial DNN model, these(More)
In this work, we propose a modular combination of two popular applications of neural networks to large-vocabulary continuous speech recognition. First, a deep neural network is trained to extract bottleneck features from frames of mel scale filterbank coefficients. In a similar way as is usually done for GMM/HMM systems, this network is then applied as a(More)
Speaker adaptive training (SAT) is a well studied technique for Gaussian mixture acoustic models (GMMs). Recently we proposed to perform SAT for deep neural networks (DNNs), with speaker i-vectors applied in feature learning. The resulting SAT-DNN models significantly outperform DNNs on word error rates (WERs). In this paper, we present different methods to(More)
When deployed in automated speech recognition (ASR), deep neural networks (DNNs) can be treated as a complex feature extractor plus a simple linear classifier. Previous work has investigated the utility of multilingual DNNs acting as language-universal feature extractors (LUFEs). In this paper, we explore different strategies to further improve LUFEs.(More)