Corpus ID: 219966759

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

@article{Baevski2020wav2vec2A,
  title={wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations},
  author={Alexei Baevski and Henry Zhou and Abdel-rahman Mohamed and Michael Auli},
  journal={ArXiv},
  year={2020},
  volume={abs/2006.11477}
}
We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler. wav2vec 2.0 masks the speech input in the latent space and solves a contrastive task defined over a quantization of the latent representations which are jointly learned. Experiments using all labeled data of Librispeech achieve 1.8/3.3 WER on the clean/other test sets. When… Expand
Learning de-identified representations of prosody from raw audio
UPC's Speech Translation System for IWSLT 2021
Neural Representations for Modeling Variation in English Speech
BENDR: Using Transformers and a Contrastive Self-Supervised Learning Task to Learn From Massive Amounts of EEG Data
Comparing Acoustic-based Approaches for Alzheimer's Disease Detection
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 67 REFERENCES
Improved Noisy Student Training for Automatic Speech Recognition
Iterative Pseudo-Labeling for Speech Recognition
Reducing Transformer Depth on Demand with Structured Dropout
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures
Deep Networks with Stochastic Depth
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context
Effectiveness of self-supervised pre-training for speech recognition
vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations
A Simple Framework for Contrastive Learning of Visual Representations
...
1
2
3
4
5
...