Quantifying Language Variation Acoustically with Few Resources
@article{Bartelds2022QuantifyingLV, title={Quantifying Language Variation Acoustically with Few Resources}, author={Martijn Bartelds and Martijn Wieling}, journal={ArXiv}, year={2022}, volume={abs/2205.02694} }
Deep acoustic models represent linguistic information based on massive amounts of data. Unfortunately, for regional languages and dialects such resources are mostly not available. However, deep acoustic models might have learned linguistic information that transfers to low-resource languages. In this study, we evaluate whether this is the case through the task of distinguishing low-resource (Dutch) regional varieties. By extracting embeddings from the hidden layers of various wav2vec 2.0 models…
References
SHOWING 1-10 OF 44 REFERENCES
Neural representations for modeling variation in speech
- LinguisticsJournal of Phonetics
- 2022
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
- Computer ScienceNeurIPS
- 2020
We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being…
vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations
- Computer ScienceICLR
- 2020
Experiments show that BERT pre-training achieves a new state of the art on TIMIT phoneme classification and WSJ speech recognition and the algorithm uses a gumbel softmax or online k-means clustering to quantize the dense representations.
Layer-Wise Analysis of a Self-Supervised Speech Representation Model
- Computer Science2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
- 2021
This work examines one recent and successful pre-trained model (wav2vec 2.0), via its intermediate representation vectors, using a suite of analysis tools to characterize the evolution of information across model layers, and understand how fine-tuning the model for automatic speech recognition (ASR) affects these observations.
Unsupervised Cross-lingual Representation Learning for Speech Recognition
- Computer ScienceInterspeech
- 2021
XLSR is presented which learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages to enable a single multilingual speech recognition model which is competitive to strong individual models.
Common Voice: A Massively-Multilingual Speech Corpus
- Computer ScienceLREC
- 2020
This work presents speech recognition experiments using Mozilla’s DeepSpeech Speech-to-Text toolkit, and finds an average Character Error Rate improvement for twelve target languages, for most of these languages, these are the first ever published results on end- to-end Automatic Speech Recognition.
Information retrieval for music and motion
- Computer Science
- 2007
Analysis and Retrieval Techniques for Music Data, SyncPlayer: An Advanced Audio Player, and Relational Features and Adaptive Segmentation.
Adapting Monolingual Models: Data can be Scarce when Language Similarity is High
- Computer ScienceFINDINGS
- 2021
This work retrain the lexical layers of four BERT-based models using data from two low-resource target language varieties, while the Transformer layers are independently finetuned on a POS-tagging task in the model’s source language.