Transcribing broadcast news with the 1997 Abbot System

@article{Cook1998TranscribingBN,
  title={Transcribing broadcast news with the 1997 Abbot System},
  author={Gary D. Cook and Tony Robinson},
  journal={Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181)},
  year={1998},
  volume={2},
  pages={917-920 vol.2}
}
  • G. Cook, T. Robinson
  • Published 12 May 1998
  • Computer Science
  • Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181)
Previous DARPA CSR evaluations have focused on the transcription of broadcast news from both television and radio programmes. This is a challenging task because the data includes a variety of speaking styles and channel conditions. This paper describes the development of a connectionist-hidden Markov model (HMM) system, and the enhancements designed to improve the performance on broadcast news data. Both multilayer perceptron (MLP) and recurrent neural network acoustic models have been… 
Connectionist speech recognition of Broadcast News
TLDR
This paper describes connectionist techniques for recognition of Broadcast News, and investigates a new feature extraction technique based on the modulation-filtered spectrogram (MSG), and methods for combining multiple information sources.
Syllable onset detection applied to the portuguese language
TLDR
These methods of syllable segmentation in continuous speech are developed based on perceptually oriented feature extraction techniques, achieving results of 93% detection of onsets with insertion rates of only 15%.
Incorporating information from syllable-length time scales into automatic speech recognition
TLDR
This work compares the performance of three ASR systems: a baseline system that uses phone-scale representations and units, an experimental system that using a syllable-oriented front-end representation and syllabic units for recognition, and a third system that combines the phone- scale and syllable -scale recognizers by merging and rescoring N-best lists.
Applying dynamic context into MLP/HMM speech recognition system
  • P. Salmela
  • Computer Science
    Comput. Speech Lang.
  • 2001
TLDR
When the dynamic context was included in the MLP/HMM recognition system, the string recognition accuracy of the test set increased from 92.9 to 93.8 % on average and the signal-to-noise ratio (SNR) of this test set decreased.
A Study on Selecting and Optimizing Perceptually Relevant Features for Automatic Speech Recognition
The performance of an automatic speech recognition (ASR) system strongly depends on the representation used for the front-end. If the extracted features do not include all relevant information, the
Perceptually motivated speech recognition and mispronunciation detection
This doctoral thesis is the result of a research effort performed in two fields of speech technology, i.e., speech recognition and mispronunciation detection. Although the two areas are clearly dis
Perceptually inspired signal processing strategies for robust speech recognition in reverberant environments
TLDR
This work presentsceptually Inspired Signal-processing Strategies for Robust Speech Recognition in Reverberant Environments, a novel approach to signal-processing that automates the very labor-intensive and therefore time-heavy and expensive process of recognizing speech.

References

SHOWING 1-10 OF 29 REFERENCES
Transcription of broadcast television and radio news: the 1996 ABBOT system
TLDR
The CU-CON system is described, a hybrid connectionist-HMM large vocabulary continuous speech recognition system developed at the Cambridge University Engineering Department which participated in the 1996 ARPA Hub 4 Evaluations.
Efficient evaluation of the LVCSR search space using the NOWAY decoder
  • S. Renals, M. Hochberg
  • Computer Science
    1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings
  • 1996
TLDR
The posterior-based phone deactivation pruning approach has been extended to include phone-dependent thresholds and an improved estimate of the least upper bound on the utterance log-probability has been developed, reducing the computational cost of the recognition process performed by the NOWAY decoder.
DECODER TECHNOLOGY FOR CONNECTIONIST LARGE VOCABULARY SPEECH RECOGNITION
TLDR
An efficient search procedure and its software embodiment in a decoder, NOWAY, which has been incorporated in ABBOT, a hybrid connectionist/ hidden Markov model (HMM) LVCSR system and results indicate that phone deactivation pruning increased the search speed by an order of magnitude while incurring 2% or less relative search error.
Continuous speech recognition
TLDR
The authors focus on a tutorial description of the hybrid HMM/ANN method, which provides a mechanism for incorporating a range of sources of evidence without strong assumptions about their joint statistics, and may have applicability to much more complex systems that can incorporate deep acoustic and linguistic context.
Integrating syllable boundary information into speech recognition
TLDR
It is shown that for a small, continuous speech recognition task the addition of artificial syllabic onset information (derived from advance knowledge of the word transcriptions) lowers the word error rate by 38%.
THE USE OF RECURRENT NEURAL NETWORKS IN CONTINUOUS SPEECH RECOGNITION
TLDR
This chapter describes a use of recurrent neural networks (i.e., feedback is incorporated in the computation) as an acoustic model for continuous speech recognition as well as an appropriate parameter estimation procedure.
Context-Dependent Classes in a Hybrid Recurrent Network-HMM Speech Recognition System
TLDR
A method for incorporating context-dependent phone classes in a connectionist-HMM hybrid speech recognition system is introduced, where single-layer networks discriminate between different context classes given the phone class and the acoustic data.
Experiments in syllable-based recognition of continuous speech
TLDR
An exploratory implementation of a syllable-based recognizer using a hierarchical transition network and a method of scaling the distance measures used in the syllable matching is described, which takes into account variability in syllable production.
Phonetic Context-Dependency In a Hybrid ANN/HMM Speech Recognition System
TLDR
This dissertation details the development and implementation of a phonetic context-dependent system specifically for a recurrent neural network HMM hybrid speech recognition system and describes the research behind finding a successful implementation.
An application of recurrent nets to phone probability estimation
  • A. J. Robinson
  • Computer Science, Medicine
    IEEE Trans. Neural Networks
  • 1994
TLDR
Recognition results are presented for the DARPA TIMIT and Resource Management tasks, and it is concluded that recurrent nets are competitive with traditional means for performing phone probability estimation.
...
1
2
3
...