Say What? A Dataset for Exploring the Error Patterns That Two ASR Engines Make

  title={Say What? A Dataset for Exploring the Error Patterns That Two ASR Engines Make},
  author={Meredith Moore and Michael Stephen Saxon and Hemanth Venkateswara and Visar Berisha and Sethuraman Panchanathan},
We present a new metadataset which provides insight into where and how two ASR systems make errors on several different speech datasets. By making this data readily available to researchers, we hope to stimulate research in the area of WER estimation models, in order to gain a deeper understanding of how intelligibility is encoded in speech. Using this dataset, we attempt to estimate intelligibility using a state-of-the-art model for speech quality estimation and found that this model did not… Expand
End-to-End Spoken Language Understanding for Generalized Voice Assistants
This work proposes a fully differentiable, transformer-based, hierarchical system that can be pretrained at both the ASR and NLU levels that leads to an SLU system that achieves significant improvements over baselines on a complex internal generalized VA dataset with a 43% improvement in accuracy. Expand
CEASR: A Corpus for Evaluating Automatic Speech Recognition
CEASR is a data set based on public speech corpora, containing metadata along with transcripts generated by several modern state-of-the-art ASR systems, with normalised transcript texts and metadata that enables researchers to perform ASR-related evaluations and various in-depth analyses with noticeably reduced effort. Expand
UncommonVoice: A Crowdsourced Dataset of Dysphonic Speech
To facilitate more accessible spoken language technologies and advance the study of dysphonic speech this paper presents UncommonVoice, a freely-available, crowd-sourced speech corpus consisting ofExpand
Dysarthric Speech Recognition with Lattice-Free MMI
  • Enno Hermann, M. Magimai.-Doss
  • Computer Science
  • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2020
This paper focuses on the use of state-of-the-art sequence-discriminative training, in particular lattice-free maximum mutual information (LF-MMI), for improving dysarthric speech recognition. Expand
Predicción del Composite Requerido en el Diseño de un Recipiente Toroidal Mediante una Red Neuronal Artificial
Contexto: Dentro del diseño de los recipientes toroidales, minimizar la cantidad de material, es muy importante para la reducción de costos de producción; los métodos convencionales que se usan paraExpand


Investigating the role of L1 in automatic pronunciation evaluation of L2 speech
A new utterance-level feature extraction scheme is used to convert two sets of measurements that can be extracted from two acoustic models given accented speech into a fixed-dimension vector which is used as an input to a statistical model to predict the accentedness of a speaker. Expand
Achieving Human Parity in Conversational Speech Recognition
The human error rate on the widely used NIST 2000 test set is measured, and the latest automated speech recognition system has reached human parity, establishing a new state of the art, and edges past the human benchmark. Expand
ASR error detection using recurrent neural network language model and complementary ASR
This work proposes two approaches to improve ASR error detection: using recurrent neural network language models to capture long-distance word context within and across previous utterances, and using a complementary ASR system to train a neural network predictor of errors using a variety of features. Expand
Word Error Rate Estimation for Speech Recognition: e-WER
This paper proposes a novel approach to estimate WER, or e-WER, which does not require a gold-standard transcription of the test set, and uses a comprehensive set of features: ASR recognised text, character recognition results to complement recognition output, and internal decoder features. Expand
Whistle-blowing ASRs: Evaluating the Need for More Inclusive Speech Recognition Systems
Evaluating the accuracy of state-of-the-art automatic speech recognition systems on two dysarthric speech datasets and comparing the results to ASR performance on control speech finds that future studies should be focused not only on making ASRs robust to environmental noise, but also more robust to different voices. Expand
Quality-Net: An End-to-End Non-intrusive Speech Quality Assessment Model based on BLSTM
This study proposes a novel end-to-end, non-intrusive speech quality evaluation model, termed Quality-Net, based on bidirectional long short-term memory, which has potential to be used in a wide variety of applications of speech signal processing. Expand
A physical method for measuring speech-transmission quality.
The resulting index, the Speech-Transmission Index (STI), has been correlated with subjective intelligibility scores obtained on 167 different transmission channels with a wide variety of disturbances and the relative predictive power of the STI appeared to be 5%. Expand
Methods for the Calculation and Use of the Articulation Index
Speech‐intelligibility testing is an expensive and time‐consuming operation that requires laboratory test conditions. In an attempt to short‐cut or make unnecessary this type of testing, a procedureExpand
Phone-level pronunciation scoring and assessment for interactive language learning
The experimental results suggest that a likelihood-based pronunciation scoring metric can achieve usable performance, especially after applying the various enhancements. Expand
The TORGO database of acoustic and articulatory speech from speakers with dysarthria
This paper describes the acquisition of a new database of dysarthric speech in terms of aligned acoustics and articulatory data from seven individuals with speech impediments caused by cerebral palsy or amyotrophic lateral sclerosis and age- and gender-matched control subjects. Expand