Improving Factored Hybrid HMM Acoustic Modeling without State Tying

  title={Improving Factored Hybrid HMM Acoustic Modeling without State Tying},
  author={Tina Raissi and Eugen Beck and Ralf Schl{\"u}ter and Hermann Ney},
In this work, we show that a factored hybrid hidden Markov model (FH-HMM) which is defined without any phonetic state-tying outperforms a state-of-the-art hybrid HMM. The factored hybrid HMM provides a link to transducer models in the way it models phonetic (label) context while preserving the strict separation of acoustic and language model of the hybrid HMM approach. Furthermore, we show that the factored hybrid model can be trained from scratch without using phonetic state-tying in any of… 

Tables from this paper


Context-Dependent Acoustic Modeling without Explicit Phone Clustering
This work addresses a direct phonetic context modeling for the hybrid Deep Neural Network (DNN)/HMM, that does not build on any phone clustering algorithm for the determination of the HMM state inventory, and obtains a factorized network consisting of different components, trained jointly.
Flat-Start Single-Stage Discriminatively Trained HMM-Based Models for ASR
This study investigates flat-start one-stage training of neural networks using lattice-free maximum mutual information (LF-MMI) objective function with HMM for large vocabulary continuous speech recognition and proposes a standalone system, which achieves word error rates comparable with that of the state-of-the-art multi-stage systems while being much faster to prepare.
Towards Consistent Hybrid HMM Acoustic Modeling
This work proposes a flat-start factored hybrid model trained by modeling the full set of triphone states explicitly without relying on clustering methods, which greatly simplifies the training of new models.
Towards Using Context-Dependent Symbols in CTC Without State-Tying Decision Trees
A CD symbol embedding network is trained together with the rest of the acoustic model and removes one of the last cases in which neural systems have to be bootstrapped from GMM-HMM ones.
Phoneme Based Neural Transducer for Large Vocabulary Speech Recognition
A simple, novel and competitive approach for phoneme-based neural transducer modeling using a simplified neural network structure and a straightforward integration with the external word-level language model to preserve the consistency of seq-to-seq modeling is presented.
The Rwth Asr System for Ted-Lium Release 2: Improving Hybrid Hmm With Specaugment
A complete training pipeline to build a state-of-the-art hybrid HMM-based ASR system on the 2nd release of the TED-LIUM corpus is presented and the best system achieves a 5.6% WER on the test set, which outperforms the previous state of theart by 27% relative.
RWTH ASR Systems for LibriSpeech: Hybrid vs Attention - w/o Data Augmentation
We present state-of-the-art automatic speech recognition (ASR) systems employing a standard hybrid DNN/HMM architecture compared to an attention-based encoder-decoder design for the LibriSpeech task.
Speaker adaptive joint training of Gaussian mixture models and bottleneck features
Experiments show that the deeper backpropagation through the speaker dependent layer is necessary for improved recognition performance, and the speaker adaptively and jointly trained BN-GMM results in 5% relative improvement over very strong speaker-independent hybrid baseline on the Quaero English broadcast news and conversations task, and on the 300-hour Switchboard task.
End-to-end Speech Recognition Using Lattice-free MMI
The work on end-to-end training of acoustic models using the lattice-free maximum mutual information (LF-MMI) objective function in the context of hidden Markov models shows that this approach can achieve comparable results to regular LF-M MI on well-known large vocabulary tasks.
Connectionist Speech Recognition: A Hybrid Approach
From the Publisher: Connectionist Speech Recognition: A Hybrid Approach describes the theory and implementation of a method to incorporate neural network approaches into state-of-the-art continuous