Improved training of end-to-end attention models for speech recognition
- Albert Zeyer, Kazuki Irie, R. Schlüter, H. Ney
- Computer ScienceInterspeech
- 8 May 2018
This work introduces a new pretraining scheme by starting with a high time reduction factor and lowering it during training, which is crucial both for convergence and final performance, and trains long short-term memory (LSTM) language models on subword units.
RWTH ASR Systems for LibriSpeech: Hybrid vs Attention - w/o Data Augmentation
- Christoph Lüscher, Eugen Beck, H. Ney
- Computer ScienceInterspeech
- 8 May 2019
We present state-of-the-art automatic speech recognition (ASR) systems employing a standard hybrid DNN/HMM architecture compared to an attention-based encoder-decoder design for the LibriSpeech task.…
Language Modeling with Deep Transformers
- Kazuki Irie, Albert Zeyer, R. Schlüter, H. Ney
- Computer ScienceInterspeech
- 10 May 2019
The analysis of attention weights shows that deep autoregressive self-attention models can automatically make use of positional information and it is found that removing the positional encoding even slightly improves the performance of these models.
On Using SpecAugment for End-to-End Speech Translation
- Parnia Bahar, Albert Zeyer, R. Schlüter, H. Ney
- Computer ScienceInternational Workshop on Spoken Language…
- 20 November 2019
This work investigates a simple data augmentation technique, SpecAugment, for end-to-end speech translation by alleviating overfitting to some extent and shows that the method also leads to significant improvements in various data conditions irrespective of the amount of training data.
A Comparison of Transformer and LSTM Encoder Decoder Models for ASR
- Albert Zeyer, Parnia Bahar, Kazuki Irie, R. Schlüter, H. Ney
- Computer ScienceAutomatic Speech Recognition & Understanding
- 1 December 2019
We present competitive results using a Transformer encoder-decoder-attention model for end-to-end speech recognition needing less training time compared to a similarly performing LSTM model. We…
A comprehensive study of deep bidirectional LSTM RNNS for acoustic modeling in speech recognition
- Albert Zeyer, P. Doetsch, P. Voigtlaender, R. Schlüter, H. Ney
- Computer ScienceIEEE International Conference on Acoustics…
- 22 June 2016
A pretraining scheme for LSTMs with layer-wise construction of the network showing good improvements especially for deep networks is introduced, and a comparison of computation times vs. recognition performance is compared.
Towards Online-Recognition with Deep Bidirectional LSTM Acoustic Models
- Albert Zeyer, R. Schlüter, H. Ney
- Computer ScienceInterspeech
- 8 September 2016
This work applies a modification to bidirectional RNNs to enable online-recognition by moving a window over the input stream and perform one forwarding through the RNN on each window, and shows in experiments that the performance of this online-enabled biddirectional LSTM performs as good as the offline bidirectionals and much better than the unidirectional L STM.
RETURNN as a Generic Flexible Neural Toolkit with Application to Translation and Speech Recognition
- Albert Zeyer, Tamer Alkhouli, H. Ney
- Computer ScienceAnnual Meeting of the Association for…
- 1 May 2018
It is shown that a layer-wise pretraining scheme for recurrent attention models gives over 1% BLEU improvement absolute and it allows to train deeper recurrent encoder networks.
The RWTH/UPB/FORTH System Combination for the 4th CHiME Challenge Evaluation
- T. Menne, Jahn Heymann, A. Mouchtaris
- Computer Science
- 2016
This paper describes automatic speech recognition systems developed jointly by RWTH, UPB and FORTH for the 1ch, 2ch and 6ch track of the 4th CHiME Challenge and compares the ASR performance of different beamforming approaches.
Training Language Models for Long-Span Cross-Sentence Evaluation
- Kazuki Irie, Albert Zeyer, R. Schlüter, H. Ney
- Computer ScienceAutomatic Speech Recognition & Understanding
- 1 December 2019
This work trains language models based on long short-term memory recurrent neural networks and Transformers using various types of training sequences and studies their robustness with respect to different evaluation modes, showing that models trained with back-propagation over sequences consisting of concatenation of multiple sentences with state carry-over across sequences effectively outperform those trained with the sentence-level training.
...
...