Learn More
1 The Fundamentals of HTK 2 1.1 General Principles of HMMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Isolated Word Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Output Probability Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Baum-Welch Re-Estimation . . . . .(More)
The key problem to be faced when building a HMM-based continuous speech recogniser is maintaining the balance between model complexity and available training data. For large vocabulary systems requiring cross-word context dependent modelling, this is particularly acute since many mmh contexts will never occur in the training data. This paper describes a(More)
This paper describes a framework for optimising the structure and parameters of a continuous density HMM-based large Ž . vocabulary recognition system using the Maximum Mutual Information Estimation MMIE criterion. To reduce the computational complexity of the MMIE training algorithm, confusable segments of speech are identified and stored as word lattices(More)
To achieve reasonable accuracy in large vocabulary speech recognition systems, it is important to use detailed acoustic models together with good long span language models. For example, in the Wall Street Journal (WSJ) task both cross-word triphones and a trigram language model are necessary to achieve state-of-the-art performance. However, when using these(More)
For many automatic speech recognition (ASR) applications, it is useful to predict the likelihood that the recognized string contains an error. This paper explores two modifications of a classic design. First, it replaces the standard maximum likelihood classifier with a maximum entropy classifier. The maximum entropy framework carries the dual advantages(More)
Live search for mobile is a cellphone application that allows users to interact with Web-based information portals. Currently the implementation is focused on information related to local businesses: their phone numbers and addresses, directions, reviews, maps of the surrounding area, and traffic. This paper describes a speech-recognition interface which(More)