Low Resource Multi-modal Data Augmentation for End-to-end ASR

@inproceedings{Renduchintala2018LowRM,
  title={Low Resource Multi-modal Data Augmentation for End-to-end ASR},
  author={Adithya Renduchintala and Shuoyang Ding and Matthew Wiesner and Shinji Watanabe},
  booktitle={INTERSPEECH},
  year={2018}
}
We present a new end-to-end architecture for automatic speech recognition (ASR) that can be trained using \emph{symbolic} input in addition to the traditional acoustic input. This architecture utilizes two separate encoders: one for acoustic input and another for symbolic input, both sharing the attention and decoder parameters. We call this architecture a multi-modal data augmentation network (MMDA), as it can support multi-modal (acoustic and symbolic) input and enables seamless mixing of… CONTINUE READING
10
Twitter Mentions

Figures, Tables, Results, and Topics from this paper.

Key Quantitative Results

  • Our best MMDA setup obtains small improvements on character error rate (CER), and as much as 7- 10% relative word error rate (WER) improvement over a baseline both with and without an external language model.

Citations

Publications citing this paper.
SHOWING 1-10 OF 11 CITATIONS

Pretraining by Backtranslation for End-to-end ASR in Low-Resource Settings.

VIEW 9 EXCERPTS
CITES METHODS, BACKGROUND & RESULTS
HIGHLY INFLUENCED

A S ] 2 0 A ug 2 01 9 Semi-supervised Sequence-to-sequence ASR using Unpaired Speech and Text

Murali Karthick Baskar, Shinji Watanabe, +3 authors Jan Černocký
  • 2019
VIEW 2 EXCERPTS
CITES BACKGROUND & METHODS

Multi-speaker Sequence-to-sequence Speech Synthesis for Data Augmentation in Acoustic-to-word Speech Recognition

  • ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2019
VIEW 1 EXCERPT
CITES BACKGROUND

Phoebe: Pronunciation-aware Contextualization for End-to-end Speech Recognition

  • ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2019
VIEW 1 EXCERPT
CITES BACKGROUND

Back-Translation-Style Data Augmentation for end-to-end ASR

  • 2018 IEEE Spoken Language Technology Workshop (SLT)
  • 2018
VIEW 1 EXCERPT
CITES METHODS

Cycle-consistency Training for End-to-end Speech Recognition

  • ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2018

References

Publications referenced by this paper.
SHOWING 1-10 OF 30 REFERENCES

Multilingual Speech Recognition with a Single End-to-End Model

  • 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2017
VIEW 1 EXCERPT

State-of-the-Art Speech Recognition with Sequence-to-Sequence Models

  • 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2017
VIEW 1 EXCERPT

Joint CTC-attention based end-to-end speech recognition using multi-task learning

  • 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2016
VIEW 1 EXCERPT

Listen, attend and spell: A neural network for large vocabulary conversational speech recognition

  • 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2016
VIEW 1 EXCERPT