Bangla-Wave: Improving Bangla Automatic Speech Recognition Utilizing N-gram Language Models

@article{Rakib2022BanglaWaveIB,
  title={Bangla-Wave: Improving Bangla Automatic Speech Recognition Utilizing N-gram Language Models},
  author={Mohammed Rakib and Md. Ismail Hossain and Nabeel Mohammed and Fuad Rahman},
  journal={ArXiv},
  year={2022},
  volume={abs/2209.12650}
}
—Although over 300M around the world speak Bangla, scant work has been done in improving Bangla voice-to-text transcription due to Bangla being a low-resource language. However, with the introduction of the Bengali Common Voice 9.0 speech dataset, Automatic Speech Recognition (ASR) models can now be significantly improved. With 399hrs of speech recordings, Bengali Common Voice is the largest and most diversified open- source Bengali speech corpus in the world. In this paper, we outperform the… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 13 REFERENCES

Bengali Common Voice Speech Dataset for Automatic Speech Recognition

The Bengali Common Voice Speech Dataset is crowdsourced, which is a sentence-level automatic speech recognition corpus that has more speaker, phoneme, and environ-mental diversity compared to the OpenSLR Bengali ASR dataset, the largest existing open-source speech dataset.

MLS: A Large-Scale Multilingual Dataset for Speech Research

This paper introduces Multilingual LibriSpeech (MLS) dataset, a large multilingual corpus suitable for speech research and believes such a large transcribed dataset will open new avenues in ASR and Text-To-Speech research.

VOXLINGUA107: A Dataset for Spoken Language Recognition

This paper generates semi-random search phrases from language-specific Wikipedia data that are then used to retrieve videos from YouTube for 107 languages and uses the data to build language recognition models for several spoken language identification tasks.

Unsupervised Cross-lingual Representation Learning for Speech Recognition

XLSR is presented which learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages to enable a single multilingual speech recognition model which is competitive to strong individual models.

VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation

We introduce VoxPopuli, a large-scale multilingual corpus providing 400K hours of unlabeled speech data in 23 languages. It is the largest open data to date for unsupervised representation learning

iNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages

This paper introduces NLP resources for 11 major Indian languages from two major language families, and creates datasets for the following tasks: Article Genre Classification, Headline Prediction, Wikipedia Section-Title Prediction, Cloze-style Multiple choice QA, Winograd NLI and COPA.

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being

A Large Multi-target Dataset of Common Bengali Handwritten Graphemes

This work proposes a labeling scheme based on graphemes (linguistic segments of word formation) that makes segmentation inside alpha-syllabary words linear and presents the first dataset of Bengali handwritten grapheme that are commonly used in everyday context.

Class-Based n-gram Models of Natural Language

This work addresses the problem of predicting a word from previous words in a sample of text and discusses n-gram models based on classes of words, finding that these models are able to extract classes that have the flavor of either syntactically based groupings or semanticallybased groupings, depending on the nature of the underlying statistics.

BABEL: Bodies, Action and Behavior with English Labels

BABEL is presented, a large dataset with language labels describing the actions being performed in mocap sequences, and can serve as a useful benchmark for progress in 3D action recognition.