Context-aware RNNLM Rescoring for Conversational Speech Recognition

  title={Context-aware RNNLM Rescoring for Conversational Speech Recognition},
  author={Kun Wei and Pengcheng Guo and Hang Lv and Zhen Tu and Lei Xie},
  journal={2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)},
  • Kun Wei, Pengcheng Guo, Lei Xie
  • Published 18 November 2020
  • Computer Science
  • 2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)
Conversational speech recognition is regarded as a challenging task due to its free-style speaking and long-term contextual dependencies. Prior work has explored the modeling of long-range context through RNNLM rescoring with improved performance. To further take advantage of the persisted nature during a conversation, such as topics or speaker turn, we extend the rescoring procedure to a new context-aware manner. For RNNLM training, we capture the contextual dependencies by concatenating… 

Figures and Tables from this paper

The RoyalFlush System of Speech Recognition for M2MeT Challenge
The standard conformer based joint CTC/Attention (Conformer) and U2++ ASR model with a bidirectional attention decoder with a modifica-tion of Conformer is trained to make full use of the performance complementary of different model architecture.


Bringing contextual information to google speech recognition
This paper utilizes an on-the-fly rescoring mechanism to adjust the LM weights of a small set of n-grams relevant to the particular context during speech decoding, which handles out of vocabulary words.
Recurrent Neural Network Language Model Adaptation for Conversational Speech Recognition
Two adaptation models for recurrent neural network language models (RNNLMs) to capture topic effects and longdistance triggers for conversational automatic speech recognition (ASR) are proposed and modest WER and perplexity reductions are shown.
Two Efficient Lattice Rescoring Methods Using Recurrent Neural Network Language Models
Two efficient lattice rescoring methods for RNNLMs are proposed and produced 1-best performance comparable to a 10 k-best rescoring baseline RNNLM system on two large vocabulary conversational telephone speech recognition tasks for US English and Mandarin Chinese.
The CAPIO 2017 Conversational Speech Recognition System
This paper shows how the state-of-the-art performance on the industry-standard NIST 2000 Hub5 English evaluation set is achieved, and proposes an acoustic model adaptation scheme that simply averages the parameters of a seed neural network acoustic model and its adapted version.
Future word contexts in neural network language models
A novel neural network structure, succeeding word RNNLMs (suRNNL Ms), where a feedforward unit is used to model a finite number of succeeding, future, words and can be trained much more efficiently and used for lattice rescoring.
HKUST/MTS: A Very Large Scale Mandarin Telephone Speech Corpus
The paper describes the design, collection, transcription and analysis of 200 hours of HKUST Mandarin Telephone Speech Corpus (HKUST/MTS), the largest and first of its kind for Mandarin conversational telephone speech, providing abundant and diversified samples for Mandarin speech recognition and other application-dependent tasks.
The Microsoft 2017 Conversational Speech Recognition System
We describe the latest version of Microsoft's conversational speech recognition system for the Switchboard and CallHome domains. The system adds a CNN-BLSTM acoustic model to the set of model
Lattice decoding and rescoring with long-Span neural network language models
This paper combines previous work on lattice decoding with long short-term memory (LSTM) neural network language models and adds refined pruning techniques to reduce the search effort by a factor of three, and introduces two novel approximations for full lattice rescoring.
Training Language Models for Long-Span Cross-Sentence Evaluation
This work trains language models based on long short-term memory recurrent neural networks and Transformers using various types of training sequences and studies their robustness with respect to different evaluation modes, showing that models trained with back-propagation over sequences consisting of concatenation of multiple sentences with state carry-over across sequences effectively outperform those trained with the sentence-level training.
Gaussian Process Lstm Recurrent Neural Network Language Models for Speech Recognition
Gaussian process (GP) LSTM RNNLMs are introduced, which allows for modeling parameter uncertainty under a Bayesian framework, and allows the optimal forms of gates being automatically learned for individual L STM cells.