• Corpus ID: 233204281

Grapheme-to-Phoneme Transformer Model for Transfer Learning Dialects

  title={Grapheme-to-Phoneme Transformer Model for Transfer Learning Dialects},
  author={Eric Engelhart and Mahsa Elyasi and Gaurav Bharaj},
Grapheme-to-Phoneme (G2P) models convert words to their phonetic pronunciations. Classic G2P methods include rule-based systems and pronunciation dictionaries, while modern G2P systems incorporate learning, such as, LSTM and Transformerbased attention models. Usually, dictionary-based methods require significant manual effort to build, and have limited adaptivity on unseen words. And transformer-based models require significant training data, and do not generalize well, especially for dialects… 

Figures and Tables from this paper

Neural Grapheme-To-Phoneme Conversion with Pre-Trained Grapheme Models

Experimental results on the Dutch, Serbo-Croatian, Bulgarian and Korean datasets of the SIGMORPHON 2021 G2P task confirm the effectiveness of the GBERT-based G 2P models under both medium-resource and low-resource data conditions.

Improved pronunciation prediction accuracy using morphology

This work explores how deep recurrent neural networks can be used to automatically learn and exploit this pattern of pronunciation to improve the pronunciation prediction quality of words related by morphological inflection, and proposes two novel approaches for supplying morphological information.



Massively Multilingual Neural Grapheme-to-Phoneme Conversion

A neural sequence-to-sequence approach to g2p which is trained on spelling–pronunciation pairs in hundreds of languages, allowing it to utilize the intrinsic similarities between different writing systems.

Multilingual Grapheme-To-Phoneme Conversion with Byte Representation

  • Mingzhi YuHieu Duy Nguyen S. Kunzmann
  • Computer Science, Linguistics
    ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2020
This work proposes a multilingual G2P model with byte-level input representation to accommodate different grapheme systems, along with an attention-based Transformer architecture, and shows that byte is an efficient representation for multilingualG2P with languages having large graphe me vocabularies.

Grapheme-to-phoneme conversion using Long Short-Term Memory recurrent neural networks

This work proposes a G2P model based on a Long Short-Term Memory (LSTM) recurrent neural network (RNN) that has the flexibility of taking into consideration the full context of graphemes and transform the problem from a series of grapheme-to-phoneme conversions to a word- to-pronunciation conversion.

One Model to Pronounce Them All: Multilingual Grapheme-to-Phoneme Conversion With a Transformer Ensemble

A simple approach of exploiting model ensembles, based on multilingual Transformers and self-training, to develop a highly effective G2P solution for 15 languages, with a sizeable improvement over the shared task competitive baselines.

Transformer based Grapheme-to-Phoneme Conversion

The results show that transformer based G2P outperforms the convolutional-based approach in terms of word error rate and significantly exceeded previous recurrent approaches (without attention) regarding word and phoneme error rates on both datasets.

Joint-sequence models for grapheme-to-phoneme conversion

A multilingual approach towards improving the linguistic module of a TTS system : Case

A multilingual approach for the linguistic module of the system to improve the phonetic transcription of French words is proposed and the Transformer architecture, a deep neural network, is used to train the multilingual G2P model.

Wiktionary as a source for automatic pronunciation extraction

Whether dictionaries from the World Wide Web which contain phonetic notations, may support the rapid creation of pronunciation dictionaries within the speech recognition and speech synthesis system building process is analyzed.

Attention is All you Need

A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech

Experimental results show that neural end-to-end TTS models trained from the LibriTTS corpus achieved above 4.0 in mean opinion scores in naturalness in five out of six evaluation speakers.