Improved Meta Learning for Low Resource Speech Recognition

  title={Improved Meta Learning for Low Resource Speech Recognition},
  author={Satwinder Singh and Ruili Wang and Feng Hou},
We propose a new meta learning based framework for low resource speech recognition that improves the previous model agnostic meta learning (MAML) approach. The MAML is a simple yet powerful meta learning approach. However, the MAML presents some core deficiencies such as training in-stabilities and slower convergence speed. To address these issues, we adopt multi-step loss (MSL). The MSL aims to calculate losses at every step of the inner loop of MAML and then combines them with a weighted… 

Figures and Tables from this paper


Meta-Adapter: Efficient Cross-Lingual Adaptation With Meta-Learning
This paper proposes to combine the adapter module with meta-learning algorithms to achieve high recognition performance under low-resource settings and improve the parameter-efficiency of the model.
Learning to adapt: a meta-learning approach for speaker adaptation
It is shown that the meta-learner can learn to perform supervised and unsupervised speaker adaptation and that it outperforms a strong baseline adapting LHUC parameters when adapting a DNN AM with 1.5M parameters.
How to train your MAML
This paper proposes various modifications to MAML that not only stabilize the system, but also substantially improve the generalization performance, convergence speed and computational overhead of MAMl, which it is called M AML++.
Learning Fast Adaptation on Cross-Accented Speech Recognition
This paper introduces a cross-accented English speech recognition task as a benchmark for measuring the ability of the model to adapt to unseen accents using the existing CommonVoice corpus and proposes an accent-agnostic approach that extends the model-agnostics meta-learning (MAML) algorithm for fast adaptation toseen accents.
Optimization as a Model for Few-Shot Learning
Meta-Learning for Low-Resource Neural Machine Translation
The proposed model-agnostic meta-learning algorithm for low-resource neural machine translation (NMT) is extended and significantly outperforms the multilingual, transfer learning based approach and enables us to train a competitive NMT system with only a fraction of training examples.
DeepF0: End-To-End Fundamental Frequency Estimation for Music and Speech Signals
A novel pitch estimation technique called DeepF0 is proposed, which leverages the available annotated data to directly learns from the raw audio in a data-driven manner and outperforms the baselines in terms of raw pitch accuracy and raw chroma accuracy even using 77.4% fewer network parameters.
Multilingual End-to-End Speech Recognition with A Single Transformer on Low-Resource Languages
This paper shows that a single multilingual ASR Transformer performs well on low-resource languages despite of some language confusion, and looks at incorporating language information into the model by inserting the language symbol at the beginning or at the end of the original sub-words sequence under the condition of language information being known during training.
Unsupervised Cross-lingual Representation Learning for Speech Recognition
XLSR is presented which learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages to enable a single multilingual speech recognition model which is competitive to strong individual models.
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning