• Corpus ID: 246411610

Reducing language context confusion for end-to-end code-switching automatic speech recognition

  title={Reducing language context confusion for end-to-end code-switching automatic speech recognition},
  author={Shuai Zhang and Jiangyan Yi and Zhengkun Tian and Jianhua Tao and Yu Ting Yeung and Liqun Deng},
Code-switching is about dealing with alternative languages in the communication process. Training end-to-end (E2E) automatic speech recognition (ASR) systems for code-switching is known to be a challenging problem because of the lack of data compounded by the increased language context confusion due to the presence of more than one language. In this paper, we propose a language-related attention mechanism to reduce multilingual context confusion for the E2E code-switching ASR model based on the… 

Figures and Tables from this paper

LAE: Language-Aware Encoder for Monolingual and Multilingual ASR

Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level and shows superior performance on both monolingual and multilingual ASR tasks.

Internal Language Model Estimation based Language Model Fusion for Cross-Domain Code-Switching Speech Recognition

This paper trains an end-to-end speech recognition model by means of merging two monolingual data sets and observes the effectiveness of the proposed ILME-based LM fusion for CSSR.

Code-Switching without Switching: Language Agnostic End-to-End Speech Translation

We propose a) a Language Agnostic end-to-end Speech Translation model (LAST), and b) a data augmentation strategy to increase code-switching (CS) performance. With increasing globalization, multiple



Decoupling Pronunciation and Language for End-to-End Code-Switching Automatic Speech Recognition

A decoupled transformer model to use mono-lingual paired data and unpaired text data to alleviate the problem of code-switching data shortage and is evaluated on the public Mandarin-English code- Switching dataset.

Towards Language-Universal End-to-End Speech Recognition

  • Suyoun KimM. Seltzer
  • Computer Science
    2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2018
This work exploits recent progress in end-to-end speech recognition to create a single multilingual speech recognition system capable of recognizing any of the languages seen in training, and proposes the use of a universal character set that is shared among all languages.

Code-Switch Language Model with Inversion Constraints for Mixed Language Speech Recognition

This work proposes a first ever code-switch language model for mixed language speech recognition that incorporates syntactic constraints by a code- switch boundary prediction model, acode-switch translation model, and a reconstruction model that is more robust than previous approaches.

Learning to Recognize Code-switched Speech Without Forgetting Monolingual Speech Recognition

It is shown that fine-tuning ASR models on code-switched speech harms performance on monolingual speech, and the Learning Without Forgetting (LWF) framework is proposed for code- Switched ASR when the authors only have access to amonolingual model and do not have the data it was trained on.

Improved mixed language speech recognition using asymmetric acoustic model and language model with code-switch inversion constraints

  • Ying LiPascale Fung
  • Computer Science
    2013 IEEE International Conference on Acoustics, Speech and Signal Processing
  • 2013
We propose an integrated framework for large vocabulary continuous mixed language speech recognition that handles the accent effect in the bilingual acoustic model and the inversion constraint well

Acoustic and Textual Data Augmentation for Improved ASR of Code-Switching Speech

This work trains state-of-the-art AMs, which were ineffective due to lack of training data, on a significantly increased amount of CS speech and monolingual Dutch speech and improves the language model (LM) by creating code-switching text, which is in practice almost non-existent.

The ASRU 2019 Mandarin-English Code-Switching Speech Recognition Challenge: Open Datasets, Tracks, Methods and Results

It turns out that traditional ASR system benefits from pronunciation lexicon, CS text generating and data augmentation, however, in E2E track, the results highlight the importance of using language identification, building-up a rational set of modeling units and spec-augment.

Bytes Are All You Need: End-to-end Multilingual Speech Recognition and Synthesis with Bytes

Bytes allow us to avoid large softmaxes in languages with large vocabularies, and share representations in multilingual models, and it is shown that bytes are superior to grapheme characters over a wide variety of languages in monolingual end-to-end speech recognition.

Language Modeling with Functional Head Constraint for Code Switching Speech Recognition

This paper proposes to learn the code mixing language model from bilingual data with this constraint in a weighted finite state transducer (WFST) framework and obtains a constrained code switch language model by first expanding the search network with a translation model, and then using parsing to restrict paths to those permissible under the constraint.

Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition

This paper proposes a knowledge distillation based training approach to integrating external language models into a sequence-to-sequence model, which achieves a character error rate of 9.3%, which is relatively reduced by 18.42% compared with the vanilla sequence- ToS model.