Corpus ID: 237532455

Utterance-level neural confidence measure for end-to-end children speech recognition

  title={Utterance-level neural confidence measure for end-to-end children speech recognition},
  author={Wei Liu and Tan Lee},
  • Wei Liu, Tan Lee
  • Published 16 September 2021
  • Computer Science, Engineering
  • ArXiv
Confidence measure is a performance index of particular importance for automatic speech recognition (ASR) systems deployed in real-world scenarios. In the present study, utterance-level neural confidence measure (NCM) in end-toend automatic speech recognition (E2E ASR) is investigated. The E2E system adopts the joint CTC-attention Transformer architecture. The prediction of NCM is formulated as a task of binary classification, i.e., accept/reject the input utterance, based on a set of predictor… Expand
1 Citations

Figures and Tables from this paper

Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition
  • Qiujia Li, Yu Zhang, David Qiu, Yanzhang He, Liangliang Cao, P. Woodland
  • Computer Science, Engineering
  • ArXiv
  • 2021
Two approaches to improve the model-based confidence estimators on OOD data are proposed: using pseudo transcriptions and an additional OOD language model, which can significantly improve the confidence metrics on TEDLIUM and Switchboard datasets while preserving in-domain performance. Expand


Utterance Confidence Measure for End-to-End Speech Recognition with Applications to Distributed Speech Recognition Scenarios
The proposed neural confidence measure (NCM) is trained as a binary classification task to accept or reject an endto-end speech recognition result and incorporates features from an encoder, a decoder, and an attention block of the attentionbased end-to- end speech recognition model to improve NCM significantly. Expand
Confidence Measures in Encoder-Decoder Models for Speech Recognition
This work presents a novel method which uses internal neural features of a frozen ASR model to train an independent neural network to predict a softmax temperature value, computed in each decoder time step and multiplied by the logits in order to redistribute the output probabilities. Expand
Confidence Estimation for Attention-Based Sequence-to-Sequence Models for Speech Recognition
  • Qiujia Li, David Qiu, +5 authors Trevor Strohman
  • Computer Science, Engineering
  • ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2021
A lightweight and effective approach named confidence estimation module (CEM) on top of an existing end-to-end ASR model is proposed that can mitigate the overconfidence problem and can produce more reliable confidence scores with and without shallow fusion of a language model. Expand
End-to-End Neural Systems for Automatic Children Speech Recognition: An Empirical Study
This study provides a critical assessment of automatic children speech recognition through an empirical study of contemporary state-of-the-art end-to-end speech recognition systems. Expand
Word-based confidence measures as a guide for stack search in speech recognition
  • C. Neti, S. Roukos, E. Eide
  • Computer Science
  • 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing
  • 1997
This paper explores the use of word-based confidence measures to adaptively modify the hypothesis score during searches in continuous speech recognition: specifically, based on the confidence of the current sequence of hypothesized words during the search, the weight of its prediction is changed as a function of the confidence. Expand
Hybrid CTC/Attention Architecture for End-to-End Speech Recognition
The proposed hybrid CTC/attention end-to-end ASR is applied to two large-scale ASR benchmarks, and exhibits performance that is comparable to conventional DNN/HMM ASR systems based on the advantages of both multiobjective learning and joint decoding without linguistic resources. Expand
Transfer Learning from Adult to Children for Speech Recognition: Evaluation, Analysis and Recommendations
This work attempts to address the key challenges using transfer learning from adult's models to children's models in a Deep Neural Network (DNN) framework for children's Automatic Speech Recognition (ASR) task evaluating on multiple children's speech corpora with a large vocabulary. Expand
The CUHK-TUDELFT System for The SLT 2021 Children Speech Recognition Challenge
This technical report describes the approach, which combines the use of a joint CTC-attention end-to-end (E2E) speech recognition framework, transfer learning, data augmentation and development of various language models for speech recognition in SLT children. Expand
Joint CTC-attention based end-to-end speech recognition using multi-task learning
A novel method for end-to-end speech recognition to improve robustness and achieve fast convergence by using a joint CTC-attention model within the multi-task learning framework, thereby mitigating the alignment issue. Expand
The SLT 2021 Children Speech Recognition Challenge: Open Datasets, Rules and Baselines
The Children Speech Recognition Challenge (CSRC) is launched, as a flagship satellite event of IEEE SLT 2021 workshop, and the datasets, rules, evaluation method as well as baselines are introduced. Expand