Articulatory Information and Multiview Features for Large Vocabulary Continuous Speech Recognition

@article{Mitra2018ArticulatoryIA,
  title={Articulatory Information and Multiview Features for Large Vocabulary Continuous Speech Recognition},
  author={Vikramjit Mitra and Weiqi Wang and Chris Bartels and Horacio Franco and Dimitra Vergyri},
  journal={2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2018},
  pages={5634-5638}
}
This paper explores the use of multi-view features and their discriminative transforms in a convolutional deep neural network (CNN) architecture for a continuous large vocabulary speech recognition task. Mel-filterbank energies and perceptually motivated forced damped oscillator coefficient (DOC) features are used after feature-space maximum-likelihood linear regression (fMLLR) transforms, which are combined and fed as a multi-view feature to a single CNN acoustic model. Use of multi-view… CONTINUE READING
2
Twitter Mentions

Figures, Tables, Results, and Topics from this paper.

Key Quantitative Results

  • When the articulatory features in the form of the TVs were used in addition to the MFB+fMLLR and DOC+fMLLR features in a f-CNN-DNN model, the best performance from a single acoustic model was obtained, which produced a relative WER reduction of 3% and 2% for SWB and CH evaluation sets respectively, compared to the CNN acoustic model trained with MFB+fMLLR and DOC+fMLLR features.

Citations

Publications citing this paper.

References

Publications referenced by this paper.
SHOWING 1-10 OF 29 REFERENCES

SRILM - an extensible language modeling toolkit

  • INTERSPEECH
  • 2002
VIEW 3 EXCERPTS
HIGHLY INFLUENTIAL

Joint modeling of articulatory and acoustic spaces for continuous speech recognition tasks

  • 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2017
VIEW 1 EXCERPT

JHU ASpIRE system: Robust LVCSR with TDNNS, iVector adaptation and RNN-LMS

  • 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)
  • 2015
VIEW 1 EXCERPT

Articulatory features from deep neural networks and their role in speech recognition

  • 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2014
VIEW 1 EXCERPT