Share This Author
LSTM Neural Networks for Language Modeling
This work analyzes the Long Short-Term Memory neural network architecture on an English and a large French language modeling task and gains considerable improvements in WER on top of a state-of-the-art speech recognition system.
Confidence measures for large vocabulary continuous speech recognition
- F. Wessel, R. Schlüter, Klaus Macherey, H. Ney
- Computer ScienceIEEE Trans. Speech Audio Process.
- 1 March 2001
It is shown that the posterior probabilities computed on word graphs outperform all other confidence measures and are compared with two alternative confidence measures, i.e., the acoustic stability and the hypothesis density.
Improved training of end-to-end attention models for speech recognition
This work introduces a new pretraining scheme by starting with a high time reduction factor and lowering it during training, which is crucial both for convergence and final performance, and trains long short-term memory (LSTM) language models on subword units.
Investigations on error minimizing training criteria for discriminative training in automatic speech recognition
The MCE criterion is embedded in an extended unifying approach for a class of discriminative training criteria which allows for direct comparison of the performance gain obtained with the improvements of other commonly used criteria such as Maximum Mutual Information and Minimum Word Error.
RWTH ASR Systems for LibriSpeech: Hybrid vs Attention - w/o Data Augmentation
We present state-of-the-art automatic speech recognition (ASR) systems employing a standard hybrid DNN/HMM architecture compared to an attention-based encoder-decoder design for the LibriSpeech task.…
Computing Mel-frequency cepstral coefficients on the power spectrum
- S. Molau, M. Pitz, R. Schlüter, H. Ney
- Computer ScienceIEEE International Conference on Acoustics…
- 7 May 2001
The presented approach simplifies the speech recognizers front end by merging subsequent signal analysis steps into a single one, which avoids possible interpolation and discretization problems and results in a compact implementation.
From Feedforward to Recurrent LSTM Neural Networks for Language Modeling
- M. Sundermeyer, H. Ney, R. Schlüter
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and…
- 1 March 2015
This paper compares count models to feedforward, recurrent, and long short-term memory (LSTM) neural network variants on two large-vocabulary speech recognition tasks, and analyzes the potential improvements that can be obtained when applying advanced algorithms to the rescoring of word lattices on large-scale setups.
Language Modeling with Deep Transformers
The analysis of attention weights shows that deep autoregressive self-attention models can automatically make use of positional information and it is found that removing the positional encoding even slightly improves the performance of these models.
Gammatone Features and Feature Combination for Large Vocabulary Speech Recognition
- R. Schlüter, I. Bezrukov, H. Wagner, H. Ney
- PhysicsIEEE International Conference on Acoustics…
- 15 April 2007
The gammatone features presented here lead to competitive results on the EPPS English task, and considerable improvements were obtained by subsequent combination to a number of standard acoustic features, i.e. MFCC, PLP, MF-PLP, and VTLN plus voicedness.
Using word probabilities as confidence measures
- F. Wessel, Klaus Macherey, R. Schlüter
- Computer ScienceProceedings of the IEEE International Conference…
- 12 May 1998
An approach to estimate the confidence in a hypothesized word as its posterior probability, given all acoustic feature vectors of the speaker utterance, as the sum of all word hypothesis probabilities which represent the occurrence of the same word in more or less the same segment of time.