• Corpus ID: 49563781

Netze in der automatischen Spracherkennung-ein Paradigmenwechsel ? Neural Networks in Automatic Speech Recognition-a Paradigm Change ?

@inproceedings{Schlter2018NetzeID,
  title={Netze in der automatischen Spracherkennung-ein Paradigmenwechsel ? Neural Networks in Automatic Speech Recognition-a Paradigm Change ?},
  author={Ruben Schl{\"u}ter and Patrick Doetsch and Pavel Golik and Markus Kitza and Tobias Menne and Kiyoshi Irie and Zs{\'o}fia T{\"u}ske and Albert Zeyer},
  year={2018}
}
In der automatischen Spracherkennung, wie dem maschinellen Lernen allgemein, werden die Strukturen der zugehörigen stochastischen Modellierung heute mehr und mehr auf unterschiedliche Formen künstlicher neuronaler Netze umgestellt. Dieser Erneuerungsprozess, der schon vor nahezu 30 Jahren begann, führte in den vergangenen 10 Jahren zu erheblichen Verbesserungen in der Erkennungsgenauigkeit. Sowohl in der akustischen Modellierung von Sprache, als auch der a-priori Modellierung von Sprache auf… 

Figures and Tables from this paper

Sequence Modeling and Alignment for LVCSR-Systems
TLDR
Two novel approaches to DNN-based ASR are discussed and analyzed, the attention-based encoder–decoder approach, and the (segmental) inverted HMM approach, with specific focus on the sequence alignment behavior of the different approaches.

References

SHOWING 1-10 OF 94 REFERENCES
Field Guide to Dynamical Recurrent Networks
TLDR
This book presents the range of dynamical recurrent network (DRN) architectures that will be used in the book and transforms the text from a collection of papers into a coherent book.
From Feedforward to Recurrent LSTM Neural Networks for Language Modeling
TLDR
This paper compares count models to feedforward, recurrent, and long short-term memory (LSTM) neural network variants on two large-vocabulary speech recognition tasks, and analyzes the potential improvements that can be obtained when applying advanced algorithms to the rescoring of word lattices on large-scale setups.
LSTM, GRU, Highway and a Bit of Attention: An Empirical Overview for Language Modeling in Speech Recognition
TLDR
It is found that the highway connections enable both standalone feedforward and recurrent neural language models to benefit better from the deep structure and provide a slight improvement of recognition accuracy after interpolation with count models.
Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)
TLDR
The "exponential linear unit" (ELU) which speeds up learning in deep neural networks and leads to higher classification accuracies and significantly better generalization performance than ReLUs and LReLUs on networks with more than 5 layers.
Parallel Neural Network Features for Improved Tandem Acoustic Modeling
TLDR
This paper investigates the concatenation of different bottleneck (BN) neural network outputs for tandem acoustic modeling via Gaussian mixture models (GMM), and shows that 2-5% relative improvement could be achieved over the single best BN feature set.
LSTM Neural Networks for Language Modeling
TLDR
This work analyzes the Long Short-Term Memory neural network architecture on an English and a large French language modeling task and gains considerable improvements in WER on top of a state-of-the-art speech recognition system.
Lattice decoding and rescoring with long-Span neural network language models
TLDR
This paper combines previous work on lattice decoding with long short-term memory (LSTM) neural network language models and adds refined pruning techniques to reduce the search effort by a factor of three, and introduces two novel approximations for full lattice rescoring.
Speech recognition with deep recurrent neural networks
TLDR
This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.
Long Short-Term Memory
TLDR
A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
...
...