Boosting the performance of connectionist large vocabulary speech recognition

@article{Cook1996BoostingTP,
  title={Boosting the performance of connectionist large vocabulary speech recognition},
  author={Gary D. Cook and Anthony J. Robinson},
  journal={Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96},
  year={1996},
  volume={3},
  pages={1305-1308 vol.3}
}
  • G. Cook, A. J. Robinson
  • Published 3 October 1996
  • Computer Science
  • Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96
Hybrid connectionist-hidden Markov model large vocabulary speech recognition has been shown to be competitive with more traditional HMM systems. Connectionist acoustic models generally use considerably less parameters than HMM's, allowing real-time operation without significant degradation of performance. However, the small number of parameters in connectionist acoustic models also poses a problem-how do we make the best use of large amounts of training data? This paper proposes a solution to… 
Utterance-level boosting of HMM speech recognizers
  • C. Meyer
  • Computer Science
    2002 IEEE International Conference on Acoustics, Speech, and Signal Processing
  • 2002
TLDR
An utterance-level approach to boosting HMM speech recognizers that outperforms combining scores from different ML baseline models and is evaluated on a large vocabulary isolated word recognition task.
Boosting HMM acoustic models in large vocabulary speech recognition
TLDR
An approach to apply a popular boosting algorithm (called “AdaBoost.M2”) to Hidden Markov Model based speech recognizers, at the level of utterances is suggested, showing that boosting significantly improves the best test error rates obtained with standard maximum likelihood training.
Combination of acoustic models in continuous speech recognition hybrid systems
TLDR
This work developed a method combining phoneme probabilities generated by the different acoustic models trained on distinct feature extraction processes, which was possible to obtain relative improvements on word error rate larger than 20% for a large vocabulary speaker independent continuous speech recognition task.
The efficient incorporation of MLP features into automatic speech recognition systems
TLDR
This paper examines how MLP features, and the associated acoustic models, can be trained efficiently on large training corpora using discriminative training techniques, and an approach that combines multiple individual MLPs is proposed, and this reduces the time needed to train MLPs on large amounts of data.
Unsupervised discovery and training of maximally dissimilar cluster models
TLDR
A technique which allows us to perform acoustic modeling at scale on large amounts of data by learning a treestructured partition of the acoustic space is described and it is demonstrated that it can significantly improve recognition accuracy in various conditions through unsupervised Maximum Mutual Information (MMI) training.
Ensemble Learning Approaches in Speech Recognition
TLDR
Ensemble learning for speech recognition has been largely fruitful, and it is expected to continue progress along with the advances in machine learning, speech and language modeling, as well as computing technology.
Constructing ensembles of dissimilar acoustic models using hidden attributes of training data
TLDR
A method to partition the training data for constructing ensembles of acoustic models using metadata attributes such as SNR, speaking rate, and duration via a binary tree is proposed using a metric proposed in this paper that is cosine-similarity based.
The 1995 ABBOT LVCSR system for multiple unknown microphones
TLDR
The emphasis of the paper is on the changes made to the 1994 ABBOT system, specifically to accommodate the H3 task, which includes improved acoustic modelling using limited word-internal context-dependent models, training on the Wall Street Journal secondary channel database, and using the linear input network for speaker and environmental adaptation.
Ensemble Methods for Phoneme Classification
TLDR
This paper investigates a number of ensemble methods for improving the performance of phoneme classification for use in a speech recognition system and shows that principled ensemble methods such as boosting and mixtures provide superior performance to more naive ensemble methods.
Matching training and testing criteria in hybrid speech recognition systems
TLDR
An approach to address consistency between training and testing criteria of the hybrid arti cial neural network and hidden Markov model (ANN/HMM) approach to speech recognition by modifying the feedforward neural network training paradigm.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 25 REFERENCES
DECODER TECHNOLOGY FOR CONNECTIONIST LARGE VOCABULARY SPEECH RECOGNITION
TLDR
An efficient search procedure and its software embodiment in a decoder, NOWAY, which has been incorporated in ABBOT, a hybrid connectionist/ hidden Markov model (HMM) LVCSR system and results indicate that phone deactivation pruning increased the search speed by an order of magnitude while incurring 2% or less relative search error.
THE USE OF RECURRENT NEURAL NETWORKS IN CONTINUOUS SPEECH RECOGNITION
TLDR
This chapter describes a use of recurrent neural networks (i.e., feedback is incorporated in the computation) as an acoustic model for continuous speech recognition as well as an appropriate parameter estimation procedure.
The 1994 Abbot hybrid connectionist-HMM large vocabulary recognition system.
TLDR
The emphasis of the paper is on the differences between the 1993 and 1994 versions of the ABBOT system, which includes the utilization of a larger training corpus, the extension of the lexicon, the application of a trigram language model, and the development of a near-realtime single-pass decoder well suited for the hybrid approach.
The 1995 ABBOT LVCSR system for multiple unknown microphones
TLDR
The emphasis of the paper is on the changes made to the 1994 ABBOT system, specifically to accommodate the H3 task, which includes improved acoustic modelling using limited word-internal context-dependent models, training on the Wall Street Journal secondary channel database, and using the linear input network for speaker and environmental adaptation.
An application of recurrent nets to phone probability estimation
  • A. J. Robinson
  • Computer Science, Medicine
    IEEE Trans. Neural Networks
  • 1994
TLDR
Recognition results are presented for the DARPA TIMIT and Resource Management tasks, and it is concluded that recurrent nets are competitive with traditional means for performing phone probability estimation.
Improving Performance in Neural Networks Using a Boosting Algorithm
TLDR
The effect of boosting is reported on four databases consisting of 12,000 digits from segmented ZIP codes from the United State Postal Service and the following from the National Institute of Standards and Testing (NIST).
Boosting and Other Ensemble Methods
TLDR
A surprising result is shown for the original boosting algorithm: namely, that as the training set size increases, the training error decreases until it asymptotes to the test error rate.
Benchmark Tests for the DARPA Spoken Language Program
TLDR
These tests were reported on and discussed in detail at the Spoken Language Systems Technology Workshop held at the Massachusetts Institute of Technology, January 20-22, 1993.
1993 Benchmark Tests for the ARPA Spoken Language Program
TLDR
This paper reports results obtained in benchmark tests conducted within the ARPA Spoken Language program in November and December of 1993, including foreign participants from Canada, France, Germany, and the United Kingdom.
Neural Network Ensembles
TLDR
It is shown that the remaining residual generalization error can be reduced by invoking ensembles of similar networks, which helps improve the performance and training of neural networks for classification.
...
1
2
3
...