Joint CTC-attention based end-to-end speech recognition using multi-task learning

@article{Kim2017JointCB,
  title={Joint CTC-attention based end-to-end speech recognition using multi-task learning},
  author={Suyoun Kim and Takaaki Hori and Shinji Watanabe},
  journal={2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2017},
  pages={4835-4839}
}
Recently, there has been an increasing interest in end-to-end speech recognition that directly transcribes speech to text without any predefined alignments. One approach is the attention-based encoder-decoder framework that learns a mapping between variable-length input and output sequences in one step using a purely data-driven method. The attention model has often been shown to improve the performance over another end-to-end approach, the Connectionist Temporal Classification (CTC), mainly… CONTINUE READING
Highly Cited
This paper has 128 citations. REVIEW CITATIONS
Related Discussions
This paper has been referenced on Twitter 17 times. VIEW TWEETS

From This Paper

Figures, tables, and topics from this paper.

Citations

Publications citing this paper.
Showing 1-10 of 90 extracted citations

Sequence Training of Encoder-decoder Model Using Policy Gradient for End- To-end Speech Recognition

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 2018
View 4 Excerpts
Highly Influenced

An Investigation of a Knowledge Distillation Method for CTC Acoustic Models

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 2018
View 8 Excerpts
Highly Influenced

129 Citations

0501002016201720182019
Citations per Year
Semantic Scholar estimates that this publication has 129 citations based on the available data.

See our FAQ for additional information.

References

Publications referenced by this paper.
Showing 1-10 of 24 references

End-to-end attention-based large vocabulary speech recognition

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 2016
View 6 Excerpts
Highly Influenced

On training the recurrent neural network encoder-decoder for large vocabulary end-to-end speech recognition

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 2016
View 1 Excerpt

EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) • 2015
View 3 Excerpts

Similar Papers

Loading similar papers…