Multi-Modal Data Augmentation for End-to-end ASR

  title={Multi-Modal Data Augmentation for End-to-end ASR},
  author={Adithya Renduchintala and Shuoyang Ding and Matthew Wiesner and Shinji Watanabe},
We present a new end-to-end architecture for automatic speech recognition (ASR) that can be trained using symbolic input in addition to the traditional acoustic input. This architecture utilizes two separate encoders: one for acoustic input and another for symbolic input, both sharing the attention and decoder parameters. We call this architecture a multi-modal data augmentation network (MMDA), as it can support multi-modal (acoustic and symbolic) input and enables seamless mixing of large text… CONTINUE READING


Publications referenced by this paper.
Showing 1-10 of 30 references

Similar Papers

Loading similar papers…