Cascaded Models with Cyclic Feedback for Direct Speech Translation

Direct speech translation describes a scenario where only speech inputs and corresponding translations are available. Such data are notoriously limited. We present a technique that allows cascades of automatic speech recognition (ASR) and machine translation (MT) to exploit in-domain direct speech translation data in addition to out-of-domain MT and ASR data. After pre-training MT and ASR, we use a feed-back cycle where the downstream performance of the MT system is used as a signal to improve… 

