Corpus ID: 17171422

2000 NIST EVALUATION OF CONVERSATIONAL SPEECH RECOGNITION OVER THE TELEPHONE: ENGLISH AND MANDAR IN PERFORMANCE RESULTS

@inproceedings{Fiscus20002000NE,
  title={2000 NIST EVALUATION OF CONVERSATIONAL SPEECH RECOGNITION OVER THE TELEPHONE: ENGLISH AND MANDAR IN PERFORMANCE RESULTS},
  author={J. Fiscus and W. Fisher and Alvin F. Martin and Mark A. Przybocki and D. Pallett},
  year={2000}
}
This paper documents the use of conversational telephone speech test materials in the NIST coordinated evaluation conducted early in 2000. The primary evaluation was of General American English speech, but a subsidiary evaluation of Mandarin speech was also offered. The primary test data consisted of twenty conversations collected for the original Switchboard Corpus but not released with the published corpus and twenty conversations from the CallHome English Corpus. The lowest English word… Expand
English Conversational Telephone Speech Recognition by Humans and Machines
TLDR
An independent set of human performance measurements on two conversational tasks are performed and it is found that human performance may be considerably better than what was earlier reported, giving the community a significantly harder goal to achieve. Expand
Nomadic Speech-Based Text Entry: A Decision Model Strategy for Improved Speech to Text Processing
TLDR
A decision model is developed to minimize recognition error rates regardless of the conditions experienced while completing dictation tasks, and error rates were reduced significantly when applying the model to existing data. Expand
The NIST Speaker Recognition Evaluations: 1996-2001
We discuss the history and purposes of the NIST evaluations of speaker recognition performance. We cover the sites that have participated, the performance measures used, and the formats used toExpand
Densely Connected Networks for Conversational Speech Recognition
TLDR
It is shown that the proposed dense LSTMs would provide more reliable performance as compared to the conventional, residual LSTm as more LSTM layers are stacked in neural networks. Expand
Learning Hidden Unit Contributions for Unsupervised Acoustic
This work presents a broad study on the adaptation of neural network acoustic models by means of learning hidden unit contributions (LHUC) – a method that linearly re-combines hidden units in aExpand
Error back propagation for sequence training of Context-Dependent Deep NetworkS for conversational speech transcription
TLDR
This work investigates back-propagation based sequence training of Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, for conversational speech transcription and finds that to get reasonable results, heuristics are needed that point to a problem with lattice sparseness. Expand
Learning Hidden Unit Contributions for Unsupervised Acoustic Model Adaptation
This work presents a broad study on the adaptation of neural network acoustic models by means of learning hidden unit contributions (LHUC) -- a method that linearly re-combines hidden units in aExpand
Multi-Stride Self-Attention for Speech Recognition
TLDR
The average WER improvement of 7.5% obtained by the TDNNs having the multi-stride self-attention layer as compared to the baseline TDNN model shows the effectiveness of the proposed multi-strate self-Attention mechanism. Expand
Nexus DNN for Speech and Speaker Recognition
Over the years, many efforts have been made on improving recognition accuracies on Automatic speech recognition (ASR) and speaker recognition (SRE), and many different technologies have beenExpand
A review of speech recognition with Sphinx engine in language detection
TLDR
Sphinx approach is applied to integrate the advantage of sequential modeling structure and its pattern classification in speech recognition to assist in next phase of the research which is focusing on building an Arab language speech recognizer by Sphi nx4 engine process approach. Expand
...
1
2
3
...

References

SHOWING 1-3 OF 3 REFERENCES
AN INTRODUCTION TO THE DIAGNOSTIC EVALUATION OF SWITCHBOARD-CORPUS AUTOMATIC SPEECH RECOGNITION SYSTEMS
TLDR
It is suggested that future-generation recognition systems would benefit from improving the acoustic models used for phonetic classification, as well as the pronunciation models involved in lexical matching. Expand
A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER)
  • J. Fiscus
  • Computer Science
  • 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings
  • 1997
TLDR
A post-recognition process which models the output generated by multiple ASR systems as independent knowledge sources that can be combined and used to generate an output with reduced error rate. Expand
Matched Pairs Sentence-Segment Word Error (MAPSSWE) Test, URL http
  • Matched Pairs Sentence-Segment Word Error (MAPSSWE) Test, URL http