Yuri Y. Khokhlov

  • Citations Per Year
Learn More
In this paper we investigate GMM-derived features recently introduced for adaptation of context-dependent deep neural network HMM (CD-DNN-HMM) acoustic models. We improve the previously proposed adaptation algorithm by applying the concept of speaker adaptive training (SAT) to DNNs built on GMM-derived features and by using fMLLR-adapted features for(More)
In this paper we propose a novel speaker adaptation method for a context-dependent deep neural network HMM (CD-DNNHMM) acoustic model. The approach is based on using GMMderived features as the input to the DNN. The described technique of processing features for DNNs makes it possible to use GMM-HMM adaptation algorithms in the neural network framework.(More)
In this paper we investigate the Gaussian Mixture Model (GMM) framework for adaptation of context-dependent deep neural network HMM (CD-DNN-HMM) acoustic models. In the previous work an initial attempt was introduced for efficient transfer of adaptation algorithms from the GMM framework to DNN models. In this work we present an extension, further detailed(More)
Different types of acoustic models created at Speech Technology Center are evaluated in this paper. Our main goal was to test how well those models work and choose one model for implementation in a large vocabulary continuous speech recognition (LVCSR) system for Russian which is under development now. Context-independent discrete and continuous models, as(More)
This work proposes a novel approach to out-of-vocabulary (OOV) keyword search (KWS) task. The proposed approach is based on using high-level features from an automatic speech recognition (ASR) system, so called phoneme posterior based (PPB) features, for decoding. These features are obtained by calculating time-dependent phoneme posterior probabilities from(More)
This paper describes the keyword search system developed by the STC team in the framework of OpenKWS 2016 evaluation. The acoustic modeling techniques included i-vectors based speaker adaptation, multilingual speaker-dependent bottleneck features, and a combination of feedforward and recurrent neural networks. To improve the language model, we augmented the(More)