Improved Meta Learning for Low Resource Speech Recognition

@article{Singh2022ImprovedML,
  title={Improved Meta Learning for Low Resource Speech Recognition},
  author={Satwinder Singh and Ruili Wang and Feng Hou},
  journal={ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2022},
  pages={4798-4802}
}
  • Satwinder Singh, Ruili Wang, Feng Hou
  • Published 11 May 2022
  • Computer Science
  • ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
We propose a new meta learning based framework for low resource speech recognition that improves the previous model agnostic meta learning (MAML) approach. The MAML is a simple yet powerful meta learning approach. However, the MAML presents some core deficiencies such as training instabilities and slower convergence speed. To address these issues, we adopt multi-step loss (MSL). The MSL aims to calculate losses at every step of the inner loop of MAML and then combines them with a weighted… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 25 REFERENCES

Meta Learning for End-To-End Low-Resource Speech Recognition

TLDR
Preliminary results showed that the proposed method, MetaASR, significantly outperforms the state-of-the-art multitask pretraining approach on all target languages with different combinations of pretraining languages.

Meta-Adapter: Efficient Cross-Lingual Adaptation With Meta-Learning

TLDR
This paper proposes to combine the adapter module with meta-learning algorithms to achieve high recognition performance under low-resource settings and improve the parameter-efficiency of the model.

Learning to adapt: a meta-learning approach for speaker adaptation

TLDR
It is shown that the meta-learner can learn to perform supervised and unsupervised speaker adaptation and that it outperforms a strong baseline adapting LHUC parameters when adapting a DNN AM with 1.5M parameters.

How to train your MAML

TLDR
This paper proposes various modifications to MAML that not only stabilize the system, but also substantially improve the generalization performance, convergence speed and computational overhead of MAMl, which it is called M AML++.

Learning Fast Adaptation on Cross-Accented Speech Recognition

TLDR
This paper introduces a cross-accented English speech recognition task as a benchmark for measuring the ability of the model to adapt to unseen accents using the existing CommonVoice corpus and proposes an accent-agnostic approach that extends the model-agnostics meta-learning (MAML) algorithm for fast adaptation toseen accents.

Optimization as a Model for Few-Shot Learning

Meta-Learning for Low-Resource Neural Machine Translation

TLDR
The proposed model-agnostic meta-learning algorithm for low-resource neural machine translation (NMT) is extended and significantly outperforms the multilingual, transfer learning based approach and enables us to train a competitive NMT system with only a fraction of training examples.

DeepF0: End-To-End Fundamental Frequency Estimation for Music and Speech Signals

TLDR
A novel pitch estimation technique called DeepF0 is proposed, which leverages the available annotated data to directly learns from the raw audio in a data-driven manner and outperforms the baselines in terms of raw pitch accuracy and raw chroma accuracy even using 77.4% fewer network parameters.

Multilingual End-to-End Speech Recognition with A Single Transformer on Low-Resource Languages

TLDR
This paper shows that a single multilingual ASR Transformer performs well on low-resource languages despite of some language confusion, and looks at incorporating language information into the model by inserting the language symbol at the beginning or at the end of the original sub-words sequence under the condition of language information being known during training.

Unsupervised Cross-lingual Representation Learning for Speech Recognition

TLDR
XLSR is presented which learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages to enable a single multilingual speech recognition model which is competitive to strong individual models.