Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word Speech Recognition

  title={Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word Speech Recognition},
  author={W. Ronny Huang and Cal Peyser and Tara N. Sainath and Ruoming Pang and Trevor Strohman and Shankar Kumar},
Language model fusion helps smart assistants recognize words which are rare in acoustic data but abundant in text-only corpora (typed search logs). However, such corpora have properties that hinder downstream performance, includ-ing being (1) too large, (2) beset with domain-mismatched content, and (3) heavy-headed rather than heavy-tailed (ex-cessively many duplicate search queries such as “weather”). We show that three simple strategies for selecting language modeling data can dramatically… 

Figures and Tables from this paper

Improving Rare Word Recognition with LM-aware MWER Training

This work introduces LMs in the learning of hybrid autoregressive transducer (HAT) models in the discriminative training framework, to mitigate the training versus inference gap regarding the use of LMs.

Unsupervised Data Selection via Discrete Speech Representation for ASR

A simple and effective unsupervised data selection method which selects acoustically similar speech to a target domain is proposed which takes the discrete speech representation available in common self-supervised learning frameworks as input, and applies a contrastiveData selection method on the discrete tokens.



Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition

This work performs LM fusion in the minimum WER (MWER) training of an E2E model to obviate the need for LM weights tuning during inference and proposes a novel MWER training with ILME ( MWER-ILME) where the ILME-based fusion is conducted to generate N-best hypotheses and their posteriors.

Improving Tail Performance of a Deliberation E2E ASR Model Using a Large Text Corpus

This work applies shallow fusion to incorporate a very large text corpus into a state-of-the-art E2EASR model and explores the impact of model size and shows that intelligent pruning of the training set can be more effective than increasing the parameter count.

Recognizing Long-Form Speech Using Streaming End-to-End Models

This work examines the ability of E2E models to generalize to unseen domains, and proposes two complementary solutions to address this: training on diverse acoustic data, and LSTM state manipulation to simulate long-form audio when training using short utterances.

Lookup-Table Recurrent Language Models for Long Tail Speech Recognition

Lookup-Table Language Models (LookupLM), a method for scaling up the size of RNN language models with only a constant increase in the floating point operations, is introduced by increasing the expressivity of the embedding table by scaling n-gram embedding tables up to nearly a billion parameters.

A Spelling Correction Model for End-to-end Speech Recognition

This paper proposes a novel approach to utilizing text-only data, by training a spelling correction (SC) model to explicitly correct errors made by the end-to-end model.

Scalable Multi Corpora Neural Language Models for ASR

Overall, it is shown that this paper can achieve a 6.2% relative WER reduction using neural LM in a second-pass n-best rescoring framework with a minimal increase in latency.

Intelligent Selection of Language Model Training Data

We address the problem of selecting non-domain-specific language model training data to build auxiliary language models for use in tasks such as machine translation. Our approach is based on

Conformer: Convolution-augmented Transformer for Speech Recognition

This work proposes the convolution-augmented transformer for speech recognition, named Conformer, which significantly outperforms the previous Transformer and CNN based models achieving state-of-the-art accuracies.

The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing

One such approach is presented, the Dataflow Model, along with a detailed examination of the semantics it enables, an overview of the core principles that guided its design, and a validation of the model itself via the real-world experiences that led to its development.