Corpus ID: 28558623

Inverted HMM - a Proof of Concept

  title={Inverted HMM - a Proof of Concept},
  author={Patrick Doetsch and Hermann Ney and Stefan Hegselmann and Ralf Schl{\"u}ter},
  booktitle={NIPS 2016},
In this work, we propose an inverted hidden Markov model (HMM) approach to automatic speech and handwriting recognition that naturally incorporates discriminative, artificial neural network based label distributions. [...] Key Method The approach does not assume the usual decomposition into a separate (generative) acoustic model and a language model, and allows for a variety of model assumptions, incl. statistical variants of attention.Expand
Inverted Alignments for End-to-End Automatic Speech Recognition
An inverted alignment approach for sequence classification systems like automatic speech recognition (ASR) that naturally incorporates discriminative, artificial-neural-network-based label distributions and allows for a variety of model assumptions, including statistical variants of attention. Expand
Sequence Modeling and Alignment for LVCSR-Systems
Two novel approaches to DNN-based ASR are discussed and analyzed, the attention-based encoder–decoder approach, and the (segmental) inverted HMM approach, with specific focus on the sequence alignment behavior of the different approaches. Expand
Segmental Encoder-Decoder Models for Large Vocabulary Automatic Speech Recognition
Different length modeling approaches for segmental models, their relation to attention-based systems and the first reported results on the Switchboard 300h speech recognition corpus using this approach are explored. Expand
Explorer End-to-end neural segmental models for speech recognition
Segmental models are an alternative to frame-based models for sequence prediction, where hypothesized path weights are based on entire segment scores rather than a single frame at a time. NeuralExpand
End-to-End Neural Segmental Models for Speech Recognition
This work reviews neural segmental models, which can be viewed as consisting of a neural network-based acoustic encoder and a finite-state transducer decoder and explores training approaches, including multistage versus end-to-end training and multitask training that combines segmental and frame-level losses. Expand
Netze in der automatischen Spracherkennung-ein Paradigmenwechsel ? Neural Networks in Automatic Speech Recognition-a Paradigm Change ?
In der automatischen Spracherkennung, wie dem maschinellen Lernen allgemein, werden die Strukturen der zugehörigen stochastischen Modellierung heute mehr und mehr auf unterschiedliche FormenExpand


End-to-end attention-based large vocabulary speech recognition
This work investigates an alternative method for sequence modelling based on an attention mechanism that allows a Recurrent Neural Network (RNN) to learn alignments between sequences of input frames and output labels. Expand
GMM-Free Flat Start Sequence-Discriminative DNN Training
The sequence-discriminative flat start training method is not only significantly faster than the straightforward approach of iterative retraining and realignment, but the word error rates attained are slightly better as well. Expand
Tandem connectionist feature extraction for conventional HMM systems
A large improvement in word recognition performance is shown by combining neural-net discriminative feature processing with Gaussian-mixture distribution modeling. Expand
Sequence to Sequence Learning with Neural Networks
This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier. Expand
Connectionist Speech Recognition: A Hybrid Approach
From the Publisher: Connectionist Speech Recognition: A Hybrid Approach describes the theory and implementation of a method to incorporate neural network approaches into state-of-the-art continuousExpand
Listen, attend and spell: A neural network for large vocabulary conversational speech recognition
We present Listen, Attend and Spell (LAS), a neural speech recognizer that transcribes speech utterances directly to characters without pronunciation models, HMMs or other components of traditionalExpand
End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results
Initial results demonstrate that this new approach achieves phoneme error rates that are comparable to the state-of-the-art HMM-based decoders, on the TIMIT dataset. Expand
Semi-supervised maximum mutual information training of deep neural network acoustic models
It is shown that if the supervision transcripts are not known, the natural analogue of MMI is to minimize the conditional entropy of the lattice of possible transcripts of the data, equivalent to the weighted average of M MI criterion over different reference transcripts, taking those reference transcripts and their weighting from the lattICE itself. Expand
Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI
A method to perform sequencediscriminative training of neural network acoustic models without the need for frame-level cross-entropy pre-training is described, using the lattice-free version of the maximum mutual information (MMI) criterion: LF-MMI. Expand
Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks
This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems of sequence learning and post-processing. Expand