Dual Script E2E framework for Multilingual and Code-Switching ASR

  title={Dual Script E2E framework for Multilingual and Code-Switching ASR},
  author={Mari Ganesh Kumar and Jom Kuriakose and Anand Thyagachandran and Akhil Arunkumar and Ashish Seth and Lodagala V S V Durga Prasad and Saish Jaiswal and Anusha Prakash and Hema A. Murthy},
India is home to multiple languages, and training automatic speech recognition (ASR) systems is challenging. Over time, each language has adopted words from other languages, such as English, leading to code-mixing. Most Indian languages also have their own unique scripts, which poses a major limitation in training multilingual and code-switching ASR systems. Inspired by results in text-to-speech synthesis, in this paper, we use an in-house rule-based phoneme-level common label set (CLS… 

Figures and Tables from this paper

DuDe: Dual-Decoder Multilingual ASR for Indian Languages using Common Label Set

This paper explores various approaches to build multilingual ASR models and proposes a novel architecture called Encoder-Decoder-decoder for building multilingual systems that use both CLS and native script labels.

An Overview of Indian Spoken Language Recognition from Machine Learning Perspective

One of the very first attempts to present a comprehensive review of the Indian spoken language recognition research field is conducted, to help assess the state of the present Indian LID research by any active researcher or any research enthusiasts from related fields.



Building Multilingual End-to-End Speech Synthesisers for Indian Languages

Subjective evaluations indicate that reasonably good quality Indic TTSes can be developed using both approaches, which emphasises the need to incorporate multilingual text processing in the end-to-end framework.

Code-switching in Indic Speech Synthesisers

To train good quality text-to-speech (TTS) synthesisers that can seamlessly handle code-switching, bilingual TTSes that are capable of handling phonotactic variations across languages are trained using combinations of monolingual data in a unified framework.

Multilingual and code-switching ASR challenges for low resource Indian languages

This challenge would like to focus on building multilingual and code-switching ASR systems through two different subtasks related to a total of seven Indian languages, namely Hindi, Marathi, Odia, Tamil, Telugu, Gujarati and Bengali.

Language-Agnostic Multilingual Modeling

A new approach to building a language-agnostic multilingual ASR system which transforms all languages to one writing system through a many-to-one transliteration transducer, effectively separating the modeling and rendering problems.

Exploring the use of Common Label Set to Improve Speech Recognition of Low Resource Indian Languages

  • Vishwas M. ShettyS. Umesh
  • Computer Science
    ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2021
The benefits of representing such similar target subword units from different languages through a Common Label Set (CLS) are explored, and models trained using CLS improve over monolingual baseline and a multilingual framework with separate symbols for each language.

Transliteration Based Data Augmentation for Training Multilingual ASR Acoustic Models in Low Resource Settings

A novel data augmentation technique to improve the performance of an end-to-end (E2E) multilingual acoustic model by transliterating data into the various languages that are part of the multilingual training set by translating data into these languages.

Generic Indic Text-to-Speech Synthesisers with Rapid Adaptation in an End-to-End Framework

Experiments indicate that good quality TTS systems can be built using only 7 minutes of adaptation data, and the capability of generic TTSes to handle speaker and language switching seamlessly, along with the ease of adaptation to a new language.

A Unified Parser for Developing Indian Language Text to Speech Synthesizers

The design of a language independent parser for text-to-speech synthesis in Indian languages is described and the accuracy of the phoneme sequences generated by the proposed parser is more accurate than that of language specific parsers.

A common attribute based unified HTS framework for speech synthesis in Indian languages

The common phoneset and common question set are used to build HTS based systems for six Indian languages, namely, Hindi, Marathi, Bengali, Tamil, Telugu and Malayalam, and a uniform HMM framework for building speech synthesisers is proposed.

Resources for Indian languages

A consortium effort with the design of database for a high-quality corpus, primarily for building text to speech(TTS) synthesis systems for 13 major Indian languages, is discussed.