• Corpus ID: 239009801

Multilingual Speech Recognition using Knowledge Transfer across Learning Processes

  title={Multilingual Speech Recognition using Knowledge Transfer across Learning Processes},
  author={Rimita Lahiri and Ken'ichi Kumatani and Eric Sun and Yao Qian},
Multilingual end-to-end (E2E) models have shown a great potential in the expansion of the language coverage in the realm of automatic speech recognition (ASR). In this paper, we aim to enhance the multilingual ASR performance in two ways, 1) studying the impact of feeding a one-hot vector identifying the language, 2) formulating the task with a meta-learning objective combined with selfsupervised learning (SSL). We associate every language with a distinct task manifold and attempt to improve… 

Figures and Tables from this paper

Building a great multi-lingual teacher with sparsely-gated mixture of experts for speech recognition

This work investigates how multi-lingual Automatic Speech Recognition networks can be scaled up with a simple routing algorithm in order to achieve better accuracy.

Efficient Self-Supervised Learning Representations for Spoken Language Identification

This paper investigates efficient methods to compute reliable representations and discard redundant information for language identification (LID) using a pre-trained multilingual wav2vec 2.0 model and proposes to employ two mechanisms to reduce irrelevant information of the representations in LID.



Leveraging Language ID in Multilingual End-to-End Speech Recognition

This paper introduces a novel technique for inferring the language ID in a streaming fashion using RNN-T, and a novel loss function that pressures the model to identify the language after as few frames as possible.

Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model

This work presents an E2E multilingual system which is equipped to operate in low-latency interactive applications, as well as handle a key challenge of real world data: the imbalance in training data across languages.

Network architectures for multilingual speech representation learning

  • Tom SercuG. Saon A. Sethy
  • Computer Science
    2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2017
It is demonstrated that ML features extracted from both models show significant improvement over the baseline CNN-DNN based ML representations, in terms of both speech recognition and keyword search performance and the comparison between the LSTM model itself and the ML representations derived from it on Georgian, the surprise language for the OpenKWS evaluation.

Mixture of Informed Experts for Multilingual Speech Recognition

  • Neeraj GaurB. Farris Yun Zhu
  • Computer Science
    ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2021
A novel variant of this approach, ‘informed experts’, is introduced, which attempts to tackle inter-task conflicts by eliminating gradients from other tasks in these task-specific parameters.

Multilingual Speech Recognition with a Single End-to-End Model

This model, which is not explicitly given any information about language identity, improves recognition performance by 21% relative compared to analogous sequence-to-sequence models trained on each language individually and improves performance by an additional 7% relative and eliminate confusion between different languages.

Multilingual representations for low resource speech recognition and keyword search

This paper examines the impact of multilingual acoustic representations on Automatic Speech Recognition (ASR) and keyword search (KWS) for low resource languages in the context of the OpenKWS15 evaluation of the IARPA Babel program and shows that these multilingual representations significantly improve ASR and KWS performance.

Language independent end-to-end architecture for joint language identification and speech recognition

This paper presents a model that can recognize speech in 10 different languages, by directly performing grapheme (character/chunked-character) based speech recognition, based on the hybrid attention/connectionist temporal classification (CTC) architecture.

Multitask Learning of Deep Neural Networks for Low-Resource Speech Recognition

  • Dongpeng ChenB. Mak
  • Computer Science
    IEEE/ACM Transactions on Audio, Speech, and Language Processing
  • 2015
It is demonstrated that the performance of the phone models of a single low-resource language can be improved by training its grapheme models in parallel under the MTL framework, and the proposed MTL methods obtain significant word recognition gains.

Deploying self-supervised learning in the wild for hybrid automatic speech recognition

The experimental results show that SSL pre-training with in-domain uncurated data can achieve better performance in comparison to all the alternative out-domain pre- training strategies.

Neural Language Codes for Multilingual Acoustic Models

The results show that during recognition multilingual Meta-Pi networks quickly adapt to the proper language coloring without retraining or new data, and perform better than monolingually trained networks.