Direct speech-to-speech translation with a sequence-to-sequence model

  title={Direct speech-to-speech translation with a sequence-to-sequence model},
  author={Ye Jia and Ron J. Weiss and Fadi Biadsy and Wolfgang Macherey and Melvin Johnson and Z. Chen and Yonghui Wu},
We present an attention-based sequence-to-sequence neural network which can directly translate speech from one language into speech in another language, without relying on an intermediate text representation. [] Key Result We conduct experiments on two Spanish-to-English speech translation datasets, and find that the proposed model slightly underperforms a baseline cascade of a direct speech-to-text translation model and a text-to-speech synthesis model, demonstrating the feasibility of the approach on this…

