No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models

@article{Sainath2018NoNF,
  title={No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models},
  author={T. Sainath and Rohit Prabhavalkar and Shankar Kumar and S. Lee and A. Kannan and David Rybach and Vlad Schogol and P. Nguyen and Bo Li and Y. Wu and Z. Chen and Chung-Cheng Chiu},
  journal={2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2018},
  pages={5859-5863}
}
  • T. Sainath, Rohit Prabhavalkar, +9 authors Chung-Cheng Chiu
  • Published 2018
  • Computer Science, Engineering, Mathematics
  • 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • For decades, context-dependent phonemes have been the dominant sub-word unit for conventional acoustic modeling systems. This status quo has begun to be challenged recently by end-to-end models which seek to combine acoustic, pronunciation, and language model components into a single neural network. Such systems, which typically predict graphemes or words, simplify the recognition process since they remove the need for a separate expert-curated pronunciation lexicon to map from phoneme-based… CONTINUE READING
    30 Citations
    Phoebe: Pronunciation-aware Contextualization for End-to-end Speech Recognition
    • 12
    • PDF
    On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition
    • 36
    • PDF
    Phoneme-Based Contextualization for Cross-Lingual Speech Recognition in End-to-End Models
    • 8
    • PDF
    Investigating the Downstream Impact of Grapheme-Based Acoustic Modeling on Spoken Utterance Classification
    • 1
    An Exploration of Directly Using Word as ACOUSTIC Modeling Unit for Speech Recognition
    • 4
    A Comparison of Modeling Units in Sequence-to-Sequence Speech Recognition with the Transformer on Mandarin Chinese
    • 39
    • PDF
    From Senones to Chenones: Tied Context-Dependent Graphemes for Hybrid Speech Recognition
    • 28
    • PDF
    G2G: TTS-Driven Pronunciation Learning for Graphemic Hybrid ASR
    • 1
    • PDF

    References

    SHOWING 1-10 OF 26 REFERENCES
    Acoustic data-driven pronunciation lexicon for large vocabulary speech recognition
    • 42
    • PDF
    Revisiting graphemes with increasing amounts of data
    • 17
    • PDF
    A Comparison of Sequence-to-Sequence Models for Speech Recognition
    • 152
    • PDF
    From speech to letters - using a novel neural network architecture for grapheme based ASR
    • 36
    • PDF
    An Analysis of Incorporating an External Language Model into a Sequence-to-Sequence Model
    • 86
    • PDF
    Multilingual acoustic modeling using graphemes
    • 41
    • PDF
    Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer
    • 163
    • PDF
    Learning Lexicons From Speech Using a Pronunciation Mixture Model
    • 48
    • PDF
    State-of-the-Art Speech Recognition with Sequence-to-Sequence Models
    • 615
    • PDF