Bangla-Wave: Improving Bangla Automatic Speech Recognition Utilizing N-gram Language Models
@article{Rakib2022BanglaWaveIB, title={Bangla-Wave: Improving Bangla Automatic Speech Recognition Utilizing N-gram Language Models}, author={Mohammed Rakib and Md. Ismail Hossain and Nabeel Mohammed and Fuad Rahman}, journal={ArXiv}, year={2022}, volume={abs/2209.12650} }
—Although over 300M around the world speak Bangla, scant work has been done in improving Bangla voice-to-text transcription due to Bangla being a low-resource language. However, with the introduction of the Bengali Common Voice 9.0 speech dataset, Automatic Speech Recognition (ASR) models can now be significantly improved. With 399hrs of speech recordings, Bengali Common Voice is the largest and most diversified open- source Bengali speech corpus in the world. In this paper, we outperform the…
References
SHOWING 1-10 OF 13 REFERENCES
Bengali Common Voice Speech Dataset for Automatic Speech Recognition
- Computer ScienceArXiv
- 2022
The Bengali Common Voice Speech Dataset is crowdsourced, which is a sentence-level automatic speech recognition corpus that has more speaker, phoneme, and environ-mental diversity compared to the OpenSLR Bengali ASR dataset, the largest existing open-source speech dataset.
MLS: A Large-Scale Multilingual Dataset for Speech Research
- Computer Science, LinguisticsINTERSPEECH
- 2020
This paper introduces Multilingual LibriSpeech (MLS) dataset, a large multilingual corpus suitable for speech research and believes such a large transcribed dataset will open new avenues in ASR and Text-To-Speech research.
VOXLINGUA107: A Dataset for Spoken Language Recognition
- Computer Science2021 IEEE Spoken Language Technology Workshop (SLT)
- 2021
This paper generates semi-random search phrases from language-specific Wikipedia data that are then used to retrieve videos from YouTube for 107 languages and uses the data to build language recognition models for several spoken language identification tasks.
Unsupervised Cross-lingual Representation Learning for Speech Recognition
- Computer ScienceInterspeech
- 2021
XLSR is presented which learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages to enable a single multilingual speech recognition model which is competitive to strong individual models.
VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation
- Computer ScienceACL
- 2021
We introduce VoxPopuli, a large-scale multilingual corpus providing 400K hours of unlabeled speech data in 23 languages. It is the largest open data to date for unsupervised representation learning…
iNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages
- Computer Science, LinguisticsFINDINGS
- 2020
This paper introduces NLP resources for 11 major Indian languages from two major language families, and creates datasets for the following tasks: Article Genre Classification, Headline Prediction, Wikipedia Section-Title Prediction, Cloze-style Multiple choice QA, Winograd NLI and COPA.
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
- Computer ScienceNeurIPS
- 2020
We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being…
A Large Multi-target Dataset of Common Bengali Handwritten Graphemes
- Computer ScienceICDAR
- 2021
This work proposes a labeling scheme based on graphemes (linguistic segments of word formation) that makes segmentation inside alpha-syllabary words linear and presents the first dataset of Bengali handwritten grapheme that are commonly used in everyday context.
Class-Based n-gram Models of Natural Language
- Computer ScienceCL
- 1992
This work addresses the problem of predicting a word from previous words in a sample of text and discusses n-gram models based on classes of words, finding that these models are able to extract classes that have the flavor of either syntactically based groupings or semanticallybased groupings, depending on the nature of the underlying statistics.
BABEL: Bodies, Action and Behavior with English Labels
- Computer Science2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2021
BABEL is presented, a large dataset with language labels describing the actions being performed in mocap sequences, and can serve as a useful benchmark for progress in 3D action recognition.