Corpus ID: 208857751

Semantic Mask for Transformer based End-to-End Speech Recognition

@article{Wang2019SemanticMF,
  title={Semantic Mask for Transformer based End-to-End Speech Recognition},
  author={Chengyi Wang and Yunzhao Wu and Yujiao Du and Jinyu Li and Shujie Liu and Liang Lu and Shuo Ren and Guoli Ye and Sheng Zhao and Ming Zhou},
  journal={ArXiv},
  year={2019},
  volume={abs/1912.03010}
}
  • Chengyi Wang, Yunzhao Wu, +7 authors Ming Zhou
  • Published 2019
  • Computer Science, Engineering
  • ArXiv
  • Attention-based encoder-decoder model has achieved impressive results for both automatic speech recognition (ASR) and text-to-speech (TTS) tasks. This approach takes advantage of the memorization capacity of neural networks to learn the mapping from the input sequence to the output sequence from scratch, without the assumption of prior knowledge such as the alignments. However, this model is prone to overfitting, especially when the amount of training data is limited. Inspired by SpecAugment… CONTINUE READING

    Figures, Tables, and Topics from this paper.

    Citations

    Publications citing this paper.
    SHOWING 1-10 OF 10 CITATIONS

    High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model

    VIEW 1 EXCERPT
    CITES BACKGROUND

    Online Hybrid CTC/Attention End-to-End Automatic Speech Recognition Architecture

    ASR free End-to-End SLU using the Transformer

    • 2020
    VIEW 1 EXCERPT
    CITES METHODS

    On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition

    VIEW 1 EXCERPT
    CITES METHODS

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 24 REFERENCES

    Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition

    • Linhao Dong, Shuang Xu, Bo Xu
    • Computer Science
    • 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    • 2018

    A Comparison of Techniques for Language Model Integration in Encoder-Decoder Speech Recognition

    VIEW 1 EXCERPT

    A Comparative Study on Transformer vs RNN in Speech Applications

    SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

    VIEW 12 EXCERPTS
    HIGHLY INFLUENTIAL

    Transformer-based Acoustic Modeling for Hybrid Speech Recognition

    VIEW 1 EXCERPT

    Attention is All you Need

    VIEW 1 EXCERPT

    State-of-the-Art Speech Recognition Using Multi-Stream Self-Attention with Dilated 1D Convolutions