• Corpus ID: 28558623

Inverted HMM - a Proof of Concept

@inproceedings{Doetsch2016InvertedH,
  title={Inverted HMM - a Proof of Concept},
  author={Patrick Doetsch and Hermann Ney and Stefan Hegselmann and Ralf Schl{\"u}ter},
  booktitle={NIPS 2016},
  year={2016}
}
In this work, we propose an inverted hidden Markov model (HMM) approach to automatic speech and handwriting recognition that naturally incorporates discriminative, artificial neural network based label distributions. [] Key Method The approach does not assume the usual decomposition into a separate (generative) acoustic model and a language model, and allows for a variety of model assumptions, incl. statistical variants of attention.

Figures from this paper

Inverted Alignments for End-to-End Automatic Speech Recognition

TLDR
An inverted alignment approach for sequence classification systems like automatic speech recognition (ASR) that naturally incorporates discriminative, artificial-neural-network-based label distributions and allows for a variety of model assumptions, including statistical variants of attention.

Sequence Modeling and Alignment for LVCSR-Systems

TLDR
Two novel approaches to DNN-based ASR are discussed and analyzed, the attention-based encoder–decoder approach, and the (segmental) inverted HMM approach, with specific focus on the sequence alignment behavior of the different approaches.

Segmental Encoder-Decoder Models for Large Vocabulary Automatic Speech Recognition

TLDR
Different length modeling approaches for segmental models, their relation to attention-based systems and the first reported results on the Switchboard 300h speech recognition corpus using this approach are explored.

Explorer End-to-end neural segmental models for speech recognition

TLDR
This work reviews neural segmental models, which can be viewed as consisting of a neural network-based acoustic encoder and a finite-state transducer decoder and explores training approaches, including multi-stage vs. end-to-end training and multitask training that combines segmental and frame-level losses.

End-to-End Neural Segmental Models for Speech Recognition

TLDR
This work reviews neural segmental models, which can be viewed as consisting of a neural network-based acoustic encoder and a finite-state transducer decoder and explores training approaches, including multistage versus end-to-end training and multitask training that combines segmental and frame-level losses.

Netze in der automatischen Spracherkennung-ein Paradigmenwechsel ? Neural Networks in Automatic Speech Recognition-a Paradigm Change ?

In der automatischen Spracherkennung, wie dem maschinellen Lernen allgemein, werden die Strukturen der zugehörigen stochastischen Modellierung heute mehr und mehr auf unterschiedliche Formen

References

SHOWING 1-10 OF 21 REFERENCES

End-to-end attention-based large vocabulary speech recognition

TLDR
This work investigates an alternative method for sequence modelling based on an attention mechanism that allows a Recurrent Neural Network (RNN) to learn alignments between sequences of input frames and output labels.

GMM-Free Flat Start Sequence-Discriminative DNN Training

TLDR
The sequence-discriminative flat start training method is not only significantly faster than the straightforward approach of iterative retraining and realignment, but the word error rates attained are slightly better as well.

Tandem connectionist feature extraction for conventional HMM systems

TLDR
A large improvement in word recognition performance is shown by combining neural-net discriminative feature processing with Gaussian-mixture distribution modeling.

Sequence to Sequence Learning with Neural Networks

TLDR
This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

Connectionist Speech Recognition: A Hybrid Approach

From the Publisher: Connectionist Speech Recognition: A Hybrid Approach describes the theory and implementation of a method to incorporate neural network approaches into state-of-the-art continuous

Listen, attend and spell: A neural network for large vocabulary conversational speech recognition

We present Listen, Attend and Spell (LAS), a neural speech recognizer that transcribes speech utterances directly to characters without pronunciation models, HMMs or other components of traditional

End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results

TLDR
Initial results demonstrate that this new approach achieves phoneme error rates that are comparable to the state-of-the-art HMM-based decoders, on the TIMIT dataset.

Semi-supervised maximum mutual information training of deep neural network acoustic models

TLDR
It is shown that if the supervision transcripts are not known, the natural analogue of MMI is to minimize the conditional entropy of the lattice of possible transcripts of the data, equivalent to the weighted average of M MI criterion over different reference transcripts, taking those reference transcripts and their weighting from the lattICE itself.

Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI

TLDR
A method to perform sequencediscriminative training of neural network acoustic models without the need for frame-level cross-entropy pre-training is described, using the lattice-free version of the maximum mutual information (MMI) criterion: LF-MMI.

Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks

TLDR
This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems of sequence learning and post-processing.