Corpus ID: 235458551

Scaling Laws for Acoustic Models

  title={Scaling Laws for Acoustic Models},
  author={J. Droppo and Oguz H. Elibol},
There is a recent trend in machine learning to increase model quality by growing models to sizes previously thought to be unreasonable. Recent work has shown that autoregressive generative models with cross-entropy objective functions exhibit smooth power-law relationships, or scaling laws, that predict model quality from model size, training set size, and the available compute budget. These scaling laws allow one to choose nearly optimal hyper-parameters given constraints on available training… Expand

Figures and Tables from this paper


Scaling Laws for Autoregressive Generative Modeling
The case that scaling laws have important implications for neural network performance, including on downstream tasks is strengthened, as empirical scaling laws for the cross-entropy loss are identified. Expand
Scaling Laws for Neural Language Models
Larger models are significantly more sample-efficient, such that optimally compute-efficient training involves training very large models on a relatively modest amount of data and stopping significantly before convergence. Expand
Generative Pre-Training for Speech with Autoregressive Predictive Coding
  • Yu-An Chung, James R. Glass
  • Computer Science, Engineering
  • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2020
This paper proposes to use autoregressive predictive coding (APC), a recently proposed self-supervised objective, as a generative pre-training approach for learning meaningful, non-specific, and transferable speech representations. Expand
Vector-Quantized Autoregressive Predictive Coding
This work proposes Vector-Quantized Autoregressive Predictive Coding (VQ-APC), a novel model that produces quantized representations, allowing us to explicitly control the amount of information encoded in the representations, and finds that there exists a point where phonetic and speaker information are amplified to maximize a self-supervised objective. Expand
DeCoAR 2.0: Deep Contextualized Acoustic Representations with Vector Quantization
This work proposes DeCoAR 2.0, a Deep Contextualized Acoustic Representation with vector quantization, which uses Transformers in encoding module instead of LSTMs and proposes an objective that combines the reconstructive loss withvector quantization diversity loss to train speech representations. Expand
Representation Learning with Contrastive Predictive Coding
This work proposes a universal unsupervised learning approach to extract useful representations from high-dimensional data, which it calls Contrastive Predictive Coding, and demonstrates that the approach is able to learn useful representations achieving strong performance on four distinct domains: speech, images, text and reinforcement learning in 3D environments. Expand
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
This work simplifies the MoE routing algorithm and design intuitive improved models with reduced communication and computational costs, and shows large sparse models may be trained, for the first time, with lower precision formats. Expand
Deep Contextualized Acoustic Representations for Semi-Supervised Speech Recognition
This work first exploits a large amount of unlabeled audio data via representation learning, where it reconstructs a temporal slice of filterbank features from past and future context frames to train a CTC-based end-to-end ASR system using a smaller amount of labeled audio data. Expand
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks. Expand
Self-Training and Pre-Training are Complementary for Speech Recognition
  • Qiantong Xu, Alexei Baevski, +5 authors Michael Auli
  • Computer Science, Engineering
  • ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2021
P pseudo-labeling and pre-training with wav2vec 2.0 are complementary in a variety of labeled data setups to improve speech recognition systems using unlabeled data. Expand