The Kaldi Speech Recognition Toolkit
- Daniel Povey, Arnab Ghoshal, Karel Veselý
- Computer Science
- 2011
The design of Kaldi is described, a free, open-source toolkit for speech recognition research that provides a speech recognition system based on finite-state automata together with detailed documentation and a comprehensive set of scripts for building complete recognition systems.
Recurrent neural network based language model
- Tomas Mikolov, M. Karafiát, L. Burget, J. Černocký, S. Khudanpur
- Computer ScienceInterspeech
- 2010
Results indicate that it is possible to obtain around 50% reduction of perplexity by using mixture of several RNN LMs, compared to a state of the art backoff language model.
Extensions of recurrent neural network language model
- Tomas Mikolov, Stefan Kombrink, L. Burget, J. Černocký, S. Khudanpur
- Computer ScienceIEEE International Conference on Acoustics…
- 22 May 2011
Several modifications of the original recurrent neural network language model are presented, showing approaches that lead to more than 15 times speedup for both training and testing phases and possibilities how to reduce the amount of parameters in the model.
The subspace Gaussian mixture model - A structured model for speech recognition
- Daniel Povey, L. Burget, Samuel Thomas
- Computer ScienceComputer Speech and Language
- 1 April 2011
Sequence-discriminative training of deep neural networks
- Karel Veselý, Arnab Ghoshal, L. Burget, Daniel Povey
- Computer ScienceInterspeech
- 1 August 2013
Different sequence-discriminative criteria are shown to lower word error rates by 7-9% relative, on a standard 300 hour American conversational telephone speech task.
RNNLM - Recurrent Neural Network Language Modeling Toolkit
- Tomas Mikolov, Stefan Kombrink, Anoop Deoras, L. Burget, J. Černocký
- Computer Science
- 1 December 2011
We present a freely available open-source toolkit for training recurrent neural network based language models. It can be easily used to improve existing speech recognition and machine translation…
Strategies for training large scale neural network language models
- Tomas Mikolov, Anoop Deoras, Daniel Povey, L. Burget, J. Černocký
- Computer ScienceIEEE Workshop on Automatic Speech Recognition…
- 1 December 2011
This work describes how to effectively train neural network based language models on large data sets and introduces hash-based implementation of a maximum entropy model, that can be trained as a part of the neural network model.
Empirical Evaluation and Combination of Advanced Language Modeling Techniques
- Tomas Mikolov, Anoop Deoras, Stefan Kombrink, L. Burget, J. Černocký
- Computer ScienceInterspeech
- 1 August 2011
It is concluded that for both small and moderately sized tasks, new state of the art results with combination of models, that is significantly better than performance of any individual model are obtained.
Subspace Gaussian Mixture Models for speech recognition
- Daniel Povey, L. Burget, Samuel Thomas
- Computer ScienceIEEE International Conference on Acoustics…
- 14 March 2010
An acoustic modeling approach in which all phonetic states share a common Gaussian Mixture Model structure, and the means and mixture weights vary in a subspace of the total parameter space, and this style of acoustic model allows for a much more compact representation.
Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification
- P. Matejka, O. Glembek, J. Černocký
- Computer ScienceIEEE International Conference on Acoustics…
- 22 May 2011
The use of universal background models (UBM) with full-covariance matrices is suggested and thoroughly experimentally tested and dimensionality reduction of i-vectors before entering the PLDA-HT modeling is investigated.
...
...