Context-aware RNNLM Rescoring for Conversational Speech Recognition
@article{Wei2021ContextawareRR, title={Context-aware RNNLM Rescoring for Conversational Speech Recognition}, author={Kun Wei and Pengcheng Guo and Hang Lv and Zhen Tu and Lei Xie}, journal={2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)}, year={2021}, pages={1-5} }
Conversational speech recognition is regarded as a challenging task due to its free-style speaking and long-term contextual dependencies. Prior work has explored the modeling of long-range context through RNNLM rescoring with improved performance. To further take advantage of the persisted nature during a conversation, such as topics or speaker turn, we extend the rescoring procedure to a new context-aware manner. For RNNLM training, we capture the contextual dependencies by concatenating…
One Citation
The RoyalFlush System of Speech Recognition for M2MeT Challenge
- Computer ScienceICASSP
- 2022
The standard conformer based joint CTC/Attention (Conformer) and U2++ ASR model with a bidirectional attention decoder with a modifica-tion of Conformer is trained to make full use of the performance complementary of different model architecture.
References
SHOWING 1-10 OF 29 REFERENCES
Bringing contextual information to google speech recognition
- Computer ScienceINTERSPEECH
- 2015
This paper utilizes an on-the-fly rescoring mechanism to adjust the LM weights of a small set of n-grams relevant to the particular context during speech decoding, which handles out of vocabulary words.
Recurrent Neural Network Language Model Adaptation for Conversational Speech Recognition
- Computer ScienceINTERSPEECH
- 2018
Two adaptation models for recurrent neural network language models (RNNLMs) to capture topic effects and longdistance triggers for conversational automatic speech recognition (ASR) are proposed and modest WER and perplexity reductions are shown.
Two Efficient Lattice Rescoring Methods Using Recurrent Neural Network Language Models
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2016
Two efficient lattice rescoring methods for RNNLMs are proposed and produced 1-best performance comparable to a 10 k-best rescoring baseline RNNLM system on two large vocabulary conversational telephone speech recognition tasks for US English and Mandarin Chinese.
The CAPIO 2017 Conversational Speech Recognition System
- Computer ScienceArXiv
- 2018
This paper shows how the state-of-the-art performance on the industry-standard NIST 2000 Hub5 English evaluation set is achieved, and proposes an acoustic model adaptation scheme that simply averages the parameters of a seed neural network acoustic model and its adapted version.
Future word contexts in neural network language models
- Computer Science2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
- 2017
A novel neural network structure, succeeding word RNNLMs (suRNNL Ms), where a feedforward unit is used to model a finite number of succeeding, future, words and can be trained much more efficiently and used for lattice rescoring.
HKUST/MTS: A Very Large Scale Mandarin Telephone Speech Corpus
- Physics, Computer ScienceISCSLP
- 2006
The paper describes the design, collection, transcription and analysis of 200 hours of HKUST Mandarin Telephone Speech Corpus (HKUST/MTS), the largest and first of its kind for Mandarin conversational telephone speech, providing abundant and diversified samples for Mandarin speech recognition and other application-dependent tasks.
The Microsoft 2017 Conversational Speech Recognition System
- Computer Science2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2018
We describe the latest version of Microsoft's conversational speech recognition system for the Switchboard and CallHome domains. The system adds a CNN-BLSTM acoustic model to the set of model…
Lattice decoding and rescoring with long-Span neural network language models
- Computer ScienceINTERSPEECH
- 2014
This paper combines previous work on lattice decoding with long short-term memory (LSTM) neural network language models and adds refined pruning techniques to reduce the search effort by a factor of three, and introduces two novel approximations for full lattice rescoring.
Training Language Models for Long-Span Cross-Sentence Evaluation
- Computer Science2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
- 2019
This work trains language models based on long short-term memory recurrent neural networks and Transformers using various types of training sequences and studies their robustness with respect to different evaluation modes, showing that models trained with back-propagation over sequences consisting of concatenation of multiple sentences with state carry-over across sequences effectively outperform those trained with the sentence-level training.
Gaussian Process Lstm Recurrent Neural Network Language Models for Speech Recognition
- Computer ScienceICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2019
Gaussian process (GP) LSTM RNNLMs are introduced, which allows for modeling parameter uncertainty under a Bayesian framework, and allows the optimal forms of gates being automatically learned for individual L STM cells.