CoCA-MDD: A Coupled Cross-Attention based Framework for Streaming Mispronunciation Detection and Diagnosis

  author={Nianzu Zheng and Liqun Deng and Wen-Chin Huang and Yu Ting Yeung and Baohua Xu and Yuanyuan Guo and Yasheng Wang and Xiao Chen and Xin Jiang and Qun Liu},
  journal={Interspeech 2022},
Mispronunciation detection and diagnosis (MDD) is a pop-ular research focus in computer-aided pronunciation training (CAPT) systems. End-to-end (e2e) approaches are becoming dominant in MDD. However an e2e MDD model usually requires entire speech utterances as input context, which leads to significant time latency especially for long paragraphs. We propose a streaming e2e MDD model called CoCA-MDD. We utilize conv-transformer structure to encode input speech in a streaming manner. A coupled… 

