Corpus ID: 1774023

The Kaldi Speech Recognition Toolkit

@inproceedings{Povey2011TheKS,
  title={The Kaldi Speech Recognition Toolkit},
  author={Daniel Povey and A. Ghoshal and Gilles Boulianne and L. Burget and O. Glembek and N. Goel and M. Hannemann and P. Motl{\'i}cek and Y. Qian and Petr Schwarz and J. Silovsk{\'y} and G. Stemmer and Karel Vesel{\'y}},
  year={2011}
}
We describe the design of Kaldi, a free, open-source toolkit for speech recognition research. Kaldi provides a speech recognition system based on finite-state automata (using the freely available OpenFst), together with detailed documentation and a comprehensive set of scripts for building complete recognition systems. Kaldi is written is C++, and the core library supports modeling of arbitrary phonetic-context sizes, acoustic modeling with subspace Gaussian mixture models (SGMM) as well as… Expand
The Bavieca open-source speech recognition toolkit
  • Daniel Bolaños
  • Computer Science
  • 2012 IEEE Spoken Language Technology Workshop (SLT)
  • 2012
TLDR
The design of Bavieca is described, an open-source speech recognition toolkit intended for speech research and system development that presents a simple and modular design with an emphasis on scalability and reusability. Expand
Implementation of the Standard I-vector System for the Kaldi Speech Recognition Toolkit
TLDR
This report describes implementation of the standard i-vector-PLDA framework for the Kaldi speech recognition toolkit by modifying the code so that it mimics the standard algorithms in the ivector based speaker recognition system. Expand
A complete KALDI recipe for building Arabic speech recognition systems
TLDR
A prototype broadcast news system using 200 hours GALE data that is publicly available through LDC and the first effort to share reproducible sizable training and testing results on MSA system is shared. Expand
RASR - The RWTH Aachen University Open Source Speech Recognition Toolkit
RASR is the open source version of the well-proven speech recognition toolkit developed and used at RWTH Aachen University. The current version of the package includes state of the art speechExpand
How to Add Word Classes to the Kaldi Speech Recognition Toolkit
TLDR
It is shown that the introduction of sub-word unit models for open word classes can help to robustly detect and classify out-of-vocabulary words without impairing word recognition accuracy. Expand
Continuous hindi speech recognition model based on Kaldi ASR toolkit
TLDR
Goal is to show the performance of Hindi language using present state-of-the-art (Kaldi) system and it was found that MFCC feature provide higher recognition accuracy than PLP feature. Expand
DEGREE FINAL PROJECT Automatic Speech Recognition with Kaldi toolkit
The topic of this thesis is to built an accurate automatic speech recognition system to be able to recognize speech using Kaldi, an open-source toolkit for speech recognition written in C++ and withExpand
Improvement of an Automatic Speech Recognition Toolkit
The Kaldi toolkit provides a library of modules designed to expedite the creation of automatic speech recognition systems for research purposes. For purposes of acoustic modelling, the toolkitExpand
DNN-Based Acoustic Modeling for Russian Speech Recognition Using Kaldi
TLDR
A research of DNN-based acoustic modeling for Russian speech recognition using the open-source Kaldi toolkit that obtained a relative WER reduction of 20 % comparing to the baseline GMM-HMM system. Expand
Recent Advance of Thai Open-Vocabulary Automatic Speech Recognition
We describe the recent development of the NECTEC Thai open-vocabulary automatic speech recognition system. Some of the techniques that were found beneficial over its baseline system are: hybridExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 26 REFERENCES
The IBM Attila speech recognition toolkit
We describe the design of IBM's Attila speech recognition toolkit. We show how the combination of a highly modular and efficient library of low-level C++ classes with simple interfaces, anExpand
The RWTH aachen university open source speech recognition system
TLDR
The toolkit includes state of the art speech recognition technology for acoustic model training and decoding, and a finite state automata library, and an efficient tree search decoder are notable components. Expand
Weighted finite-state transducers in speech recognition
TLDR
WFSTs provide a common and natural representation for hidden Markov models (HMMs), context-dependency, pronunciation dictionaries, grammars, and alternative recognition outputs, and general transducer operations combine these representations flexibly and efficiently. Expand
SRILM - an extensible language modeling toolkit
TLDR
The functionality of the SRILM toolkit is summarized and its design and implementation is discussed, highlighting ease of rapid prototyping, reusability, and combinability of tools. Expand
IRSTLM: an open source toolkit for handling large scale language models
TLDR
The IRSTLM toolkit supports distribution of ngram collection and smoothing over a computer cluster, language model compression through probability quantization, lazy-loading of huge language models from disk. Expand
Large vocabulary continuous speech recognition using HTK
TLDR
This work has extended the approach from using word-internal gender independent modelling to use decision tree based state clustering, cross-word triphones and gender dependent models, and gave the lowest error rate reported on the 5 k/20 k word bigram and 20 k word trigram "hub" tests. Expand
Tree-based state tying for high accuracy acoustic modelling
TLDR
This paper describes a method of creating a tied-state continuous speech recognition system using a phonetic decision tree, which is shown to lead to similar recognition performance to that obtained using an earlier data-driven approach but to have the additional advantage of providing a mapping for unseen triphones. Expand
Frame discrimination training for HMMs for large vocabulary speech recognition
  • Daniel Povey, P. Woodland
  • Computer Science
  • 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258)
  • 1999
TLDR
Experiments on the resource management and North American business tasks show that FD training can give comparable improvements to MMI, but is less computationally intensive. Expand
Sphinx-4: a flexible open source framework for speech recognition
TLDR
Sphinx-4 is a flexible, modular and pluggable framework to help foster new innovations in the core research of hidden Markov model (HMM) speech recognition systems and to provide researchers with a "researchready" system. Expand
Maximum likelihood linear transformations for HMM-based speech recognition
  • M. Gales
  • Computer Science
  • Comput. Speech Lang.
  • 1998
TLDR
The paper compares the two possible forms of model-based transforms: unconstrained, where any combination of mean and variance transform may be used, and constrained, which requires the variance transform to have the same form as the mean transform. Expand
...
1
2
3
...