Share This Author
RWTH ASR Systems for LibriSpeech: Hybrid vs Attention - w/o Data Augmentation
We present state-of-the-art automatic speech recognition (ASR) systems employing a standard hybrid DNN/HMM architecture compared to an attention-based encoder-decoder design for the LibriSpeech task.…
LSTM Language Models for LVCSR in First-Pass Decoding and Lattice-Rescoring
An approach based on a combination of one-pass decoding and lattice rescoring of LSTM-LM based language models that is able to produce competitive results on the Hub5'00 and Librispeech evaluation corpora with a runtime better than real-time.
Novel tight classification error bounds under mismatch conditions based on f-Divergence
- R. Schlüter, M. Nußbaum-Thom, Eugen Beck, Tamer Alkhouli, H. Ney
- Computer ScienceIEEE Information Theory Workshop (ITW)
- 23 December 2013
The accuracy mismatch between the ideal Bayes decision rule/Bayes test and a mismatched decision rule in statistical classification/multiple hypothesis testing is investigated explicitly and a proof of a novel generalized tight statistical bound on the accuracy mismatch is presented.
CTC in the Context of Generalized Full-Sum HMM Training
A generalized hybrid HMM-NN training procedure using the full-sum over the hidden state-sequence and identify CTC as a special case of it is formulated and an analysis of the alignment behavior of such a training procedure is presented.
Sisyphus, a Workflow Manager Designed for Machine Translation and Automatic Speech Recognition
Training and testing many possible parameters or model architectures of state-of-the-art machine translation or automatic speech recognition system is a cumbersome task. They usually require a long…
Segmental Encoder-Decoder Models for Large Vocabulary Automatic Speech Recognition
- Eugen Beck, M. Hannemann, Patrick Doetsch, R. Schlüter, H. Ney
- Computer ScienceINTERSPEECH
- 2 September 2018
Different length modeling approaches for segmental models, their relation to attention-based systems and the first reported results on the Switchboard 300h speech recognition corpus using this approach are explored.
Relative error bounds for statistical classifiers based on the f-divergence
An upper bound on the error difference between Bayes decision and a modelbased decision rule in terms of the f-Divergence between the true and model distributions is provided.
LVCSR with Transformer Language Models
By a simple reduction of redundant computations in batched selfattention the authors can obtain a 15% reduction in overall RTF on a well-tuned system and present an approach to speed up classic push-forward rescoring by mixing it with n-best list rescoring to better utilize the inherent parallelizability of Transformer language models.
Context-Dependent Acoustic Modeling without Explicit Phone Clustering
This work addresses a direct phonetic context modeling for the hybrid Deep Neural Network (DNN)/HMM, that does not build on any phone clustering algorithm for the determination of the HMM state inventory, and obtains a factorized network consisting of different components, trained jointly.
Sequence Modeling and Alignment for LVCSR-Systems
- Eugen Beck, Albert Zeyer, P. Doetsch, Andr'e Merboldt, R. Schlüter, H. Ney
- Computer ScienceITG Symposium on Speech Communication
Two novel approaches to DNN-based ASR are discussed and analyzed, the attention-based encoder–decoder approach, and the (segmental) inverted HMM approach, with specific focus on the sequence alignment behavior of the different approaches.