Massively Multilingual Adversarial Speech Recognition

@inproceedings{Adams2019MassivelyMA,
  title={Massively Multilingual Adversarial Speech Recognition},
  author={Oliver Adams and Matthew Wiesner and Shinji Watanabe and David Yarowsky},
  booktitle={NAACL},
  year={2019}
}
We report on adaptation of multilingual end-to-end speech recognition models trained on as many as 100 languages. Our findings shed light on the relative importance of similarity between the target and pretraining languages along the dimensions of phonetics, phonology, language family, geographical location, and orthography. In this context, experiments demonstrate the effectiveness of two additional pretraining objectives in encouraging language-independent encoder representations: a context… Expand
Unsupervised Pretraining Transfers Well Across Languages
TLDR
It is shown that a slight modification of the CPC pretraining extracts features that transfer well to other languages, being on par or even outperforming supervised pretraining, shows the potential of unsupervised methods for languages with few linguistic resources. Expand
Zero-Shot Cross-Lingual Phonetic Recognition with External Language Embedding
Many existing languages are too sparsely resourced for monolingual deep learning networks to achieve high accuracy. Multilingual phonetic recognition systems mitigate data sparsity issues by trainingExpand
Pseudo-Labeling for Massively Multilingual Speech Recognition
TLDR
This work proposes a simple pseudo-labeling recipe that works well even with low-resource languages, and can yield a model with better performance for many languages that also transfers well to LibriSpeech. Expand
Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters
TLDR
It is shown that multilingual training of ASR models on several languages can improve recognition performance, in particular, on low resource languages. Expand
A Study of Multilingual End-to-End Speech Recognition for Kazakh, Russian, and English
TLDR
This work describes the development of multilingual E2E ASR based on Transformer networks and then performs an extensive assessment on the aforementioned languages and compares two variants of output grapheme set construction. Expand
Multilingual Acoustic Word Embedding Models for Processing Zero-resource Languages
TLDR
This work considers two multilingual recurrent neural network models: a discriminative classifier trained on the joint vocabularies of all training languages, and a correspondence autoencoder trained to reconstruct word pairs to train a single supervised embedding model on labelled data from multiple well-resourced languages and then apply it to unseen zero-resource languages. Expand
Differentiable Allophone Graphs for Language-Universal Speech Recognition
TLDR
This work presents a general framework to derive phone-level supervision from only phonemic transcriptions and phone-to-phoneme mappings with learnable weights represented using weighted finite-state transducers, which they are called differentiable allophone graphs. Expand
Semi-supervised Domain Adaptation for Dependency Parsing via Improved Contextualized Word Representations
TLDR
This work applies adversarial learning to three representative semi-supervised domain adaption methods and utilizes a large-scale target-domain unlabeled data to fine-tune BERT with only the language model loss, thus obtaining reliable contextualized word representations that benefit for the cross-domain dependency parsing. Expand
Efficient Weight factorization for Multilingual Speech Recognition
TLDR
A novel multilingual architecture that targets the core operation in neural networks: linear transformation functions by decomposing each weight matrix into a shared component and a language dependent component and is proved to be effective in two multilingual settings with 7 and 27 languages. Expand
Analyzing ASR Pretraining for Low-Resource Speech-to-Text Translation
TLDR
The best predictor of final AST performance is the word error rate of the pretrained ASR model, and it is found that differences in ASR/AST performance correlate with how phonetic information is encoded in the later RNN layers of the model. Expand
...
1
2
3
4
...

References

SHOWING 1-10 OF 52 REFERENCES
Adversarial Multilingual Training for Low-Resource Speech Recognition
TLDR
An adversarial multilingual training to train bottleneck (BN) networks for the target language and a parallel shared-exclusive model is also proposed to train the BN network. Expand
Domain Adversarial Training for Accented Speech Recognition
TLDR
In experiments with three Mandarin accents, it is shown that DAT yields up to 7.45% relative character error rate reduction when the authors do not have transcriptions of the accented speech, compared with the baseline trained on standard accent data only. Expand
Adversarial Learning of Raw Speech Features for Domain Invariant Speech Recognition
TLDR
Promising empirical results indicate the strength of adversarial training for unsupervised domain adaptation in ASR, thereby emphasizing the ability of DANNs to learn domain invariant features from raw speech. Expand
CMU Wilderness Multilingual Speech Dataset
  • A. Black
  • Computer Science
  • ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2019
TLDR
This paper describes the CMU Wilderness Multilingual Speech Dataset, a dataset of over 700 different languages providing audio, aligned text and word pronunciations, and describes the multi-pass alignment techniques and evaluates the results by building speech synthesizers on the aligned data. Expand
Experiments on cross-language acoustic modeling
TLDR
This paper examines the performance if limited adaptation data is available for rapid transfer of LVCSR systems to other languages, particularly for very time constrained tasks and minority languages. Expand
On the use of a multilingual neural network front-end
TLDR
This paper presents a front-end consisting of an Artificial Neural Network architecture trained with multilingual corpora that produces discriminant features that can be used as observation vectors for language or task dependent recognizers. Expand
Language independent end-to-end architecture for joint language identification and speech recognition
TLDR
This paper presents a model that can recognize speech in 10 different languages, by directly performing grapheme (character/chunked-character) based speech recognition, based on the hybrid attention/connectionist temporal classification (CTC) architecture. Expand
Multilingual Speech Recognition with a Single End-to-End Model
TLDR
This model, which is not explicitly given any information about language identity, improves recognition performance by 21% relative compared to analogous sequence-to-sequence models trained on each language individually and improves performance by an additional 7% relative and eliminate confusion between different languages. Expand
Sequence-Based Multi-Lingual Low Resource Speech Recognition
TLDR
It is shown that end-to-end multi-lingual training of sequence models is effective on context independent models trained using Connectionist Temporal Classification (CTC) loss and can be adapted cross-lingually to an unseen language using just 25% of the target data. Expand
Multilingual acoustic models using distributed deep neural networks
TLDR
Experimental results for cross- and multi-lingual network training of eleven Romance languages on 10k hours of data in total show average relative gains over the monolingual baselines, but additional gain from jointly training the languages on all data comes at an increased training time of roughly four weeks. Expand
...
1
2
3
4
5
...