Inverted HMM - a Proof of Concept
@inproceedings{Doetsch2016InvertedH, title={Inverted HMM - a Proof of Concept}, author={Patrick Doetsch and Hermann Ney and Stefan Hegselmann and Ralf Schl{\"u}ter}, booktitle={NIPS 2016}, year={2016} }
In this work, we propose an inverted hidden Markov model (HMM) approach to automatic speech and handwriting recognition that naturally incorporates discriminative, artificial neural network based label distributions. [] Key Method The approach does not assume the usual decomposition into a separate (generative) acoustic model and a language model, and allows for a variety of model assumptions, incl. statistical variants of attention.
Figures from this paper
6 Citations
Inverted Alignments for End-to-End Automatic Speech Recognition
- Computer ScienceIEEE Journal of Selected Topics in Signal Processing
- 2017
An inverted alignment approach for sequence classification systems like automatic speech recognition (ASR) that naturally incorporates discriminative, artificial-neural-network-based label distributions and allows for a variety of model assumptions, including statistical variants of attention.
Sequence Modeling and Alignment for LVCSR-Systems
- Computer ScienceITG Symposium on Speech Communication
- 2018
Two novel approaches to DNN-based ASR are discussed and analyzed, the attention-based encoder–decoder approach, and the (segmental) inverted HMM approach, with specific focus on the sequence alignment behavior of the different approaches.
Segmental Encoder-Decoder Models for Large Vocabulary Automatic Speech Recognition
- Computer ScienceINTERSPEECH
- 2018
Different length modeling approaches for segmental models, their relation to attention-based systems are explored and experimental results are shown on a handwriting recognition task and on the Switchboard 300h speech recognition corpus.
Explorer End-to-end neural segmental models for speech recognition
- Computer Science
- 2017
This work reviews neural segmental models, which can be viewed as consisting of a neural network-based acoustic encoder and a finite-state transducer decoder and explores training approaches, including multi-stage vs. end-to-end training and multitask training that combines segmental and frame-level losses.
End-to-End Neural Segmental Models for Speech Recognition
- Computer ScienceIEEE Journal of Selected Topics in Signal Processing
- 2017
This work reviews neural segmental models, which can be viewed as consisting of a neural network-based acoustic encoder and a finite-state transducer decoder and explores training approaches, including multistage versus end-to-end training and multitask training that combines segmental and frame-level losses.
Netze in der automatischen Spracherkennung-ein Paradigmenwechsel ? Neural Networks in Automatic Speech Recognition-a Paradigm Change ?
- 2018
In der automatischen Spracherkennung, wie dem maschinellen Lernen allgemein, werden die Strukturen der zugehörigen stochastischen Modellierung heute mehr und mehr auf unterschiedliche Formen…
References
SHOWING 1-10 OF 21 REFERENCES
End-to-end attention-based large vocabulary speech recognition
- Computer Science2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2016
This work investigates an alternative method for sequence modelling based on an attention mechanism that allows a Recurrent Neural Network (RNN) to learn alignments between sequences of input frames and output labels.
GMM-Free Flat Start Sequence-Discriminative DNN Training
- Computer ScienceINTERSPEECH
- 2016
The sequence-discriminative flat start training method is not only significantly faster than the straightforward approach of iterative retraining and realignment, but the word error rates attained are slightly better as well.
Tandem connectionist feature extraction for conventional HMM systems
- Computer Science2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)
- 2000
A large improvement in word recognition performance is shown by combining neural-net discriminative feature processing with Gaussian-mixture distribution modeling.
Sequence to Sequence Learning with Neural Networks
- Computer ScienceNIPS
- 2014
This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
Connectionist Speech Recognition: A Hybrid Approach
- Computer Science
- 1993
From the Publisher:
Connectionist Speech Recognition: A Hybrid Approach describes the theory and implementation of a method to incorporate neural network approaches into state-of-the-art continuous…
Listen, attend and spell: A neural network for large vocabulary conversational speech recognition
- Computer Science2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2016
We present Listen, Attend and Spell (LAS), a neural speech recognizer that transcribes speech utterances directly to characters without pronunciation models, HMMs or other components of traditional…
End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results
- Computer ScienceArXiv
- 2014
Initial results demonstrate that this new approach achieves phoneme error rates that are comparable to the state-of-the-art HMM-based decoders, on the TIMIT dataset.
Semi-supervised maximum mutual information training of deep neural network acoustic models
- Computer ScienceINTERSPEECH
- 2015
It is shown that if the supervision transcripts are not known, the natural analogue of MMI is to minimize the conditional entropy of the lattice of possible transcripts of the data, equivalent to the weighted average of M MI criterion over different reference transcripts, taking those reference transcripts and their weighting from the lattICE itself.
Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI
- Computer ScienceINTERSPEECH
- 2016
A method to perform sequencediscriminative training of neural network acoustic models without the need for frame-level cross-entropy pre-training is described, using the lattice-free version of the maximum mutual information (MMI) criterion: LF-MMI.
Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks
- Computer ScienceICML
- 2006
This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems of sequence learning and post-processing.