Large-Vocabulary Speech Recognition Algorithms

  title={Large-Vocabulary Speech Recognition Algorithms},
  author={Mukund Padmanabhan and Michael Picheny},
By making the advances necessary to implement next-generation speech recognition applications, researchers could develop systems within a decade that match human performance levels. 

Figures and Tables from this paper

Automatic speech recognition

  • D. O'Shaughnessy
  • Computer Science
    2015 CHILEAN Conference on Electrical, Electronics Engineering, Information and Communication Technologies (CHILECON)
  • 2015
This Plenary presents automatic speech recognition (ASR) as a task of artificial intelligence. The basis, the methodology, spectral processing, distance measures for speech, segmentation speech,

Wavelet based speech recognition

This application illustrates how wavelets can be used for better accuracy in speech recognition by using a template of pre-recorded, wavelet-transformed phonemes as its basis for comparison.

Wavelet transform based features vector extraction in isolated words speech recognition system

An algorithm that uses wavelet transform and energy to extract and represent features of the acquired speech signals as a basis for accurate method of identifying and classifying speech signals according to their features is developed.

Towards Superhuman Speech Recognition

After over 40 years of research, human speech recognition performance still substantially outstrips machine performance. Although enormous progress has been made, the ultimate goal of achieving or

Automatic Speech Recognition

The chapter presents the stages of speech recognition process, resources of ASR, role and functions of speech engine—like Julius speech recognition engine, voice-over web resources, ASR algorithms, language model and acoustic models—like HMM (hidden Markov models).

Speech enabled operating system control

This work proposes a way how to operate an operating system with voice command, which has the flexibility to work with the speech of any user means and remove the problem of dasiahesitationpsila which is very effective for spontaneous speech recognition.

Automatic speech recognition, with large vocabulary, robustness, independence of speaker and multilingual processing

This work aims to provide automatic cognitive assistance via speech interface, to the elderly who live alone, at risk situation, and the large vocabulary continuous speech recognition system Julius is used in conjunction with the Hidden Markov Model Toolkit(HTK).

Endpoint Detection Enhancement for Speaker Dependent Recognition

This study focuses on improving the training time and robustness of the MLP neural network for the Malay isolated digit recognition system by proposing variance endpoint detection to accelerate the convergence time of the NN and to produce the highest recognition accuracy.

Integration of multiple acoustic and language models for improved Hindi speech recognition system

A novel approach to use the best characteristics of conventional, hybrid and segmental HMM by integrating them with the help of ROVER system combination technique is proposed and Experimental result shows that word error rate can be reduced about 4% using the proposed technique as compared to conventional methods.

Design and Development of Marathi Speech Interface System

The objective of this research is to design and development of the Marathi speech Activated Talking Calculator (MSAC) as an interface system and recommended that WDCC is robust and dynamic techniques than MFCC, LDA, PCA, and LPC.



Speech recognition by machines and humans

Statistical methods for speech recognition

The speech recognition problem hidden Markov models the acoustic model basic language modelling the Viterbi search hypothesis search on a tree and the fast match elements of information theory the

Speaker normalization on conversational telephone speech

A new system for warp scale selection which uses a simple generic voiced speech model to rapidly select appropriate frequency scales and is sufficiently streamlined that it can moved completely into the front-end processing.

A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER)

  • J. Fiscus
  • Computer Science
    1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings
  • 1997
A post-recognition process which models the output generated by multiple ASR systems as independent knowledge sources that can be combined and used to generate an output with reduced error rate.

Maximum likelihood linear transformations for HMM-based speech recognition

  • M. Gales
  • Computer Science
    Comput. Speech Lang.
  • 1998
The paper compares the two possible forms of model-based transforms: unconstrained, where any combination of mean and variance transform may be used, and constrained, which requires the variance transform to have the same form as the mean transform.

Perceptual linear predictive (PLP) analysis of speech.

  • H. Hermansky
  • Physics
    The Journal of the Acoustical Society of America
  • 1990
A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, which uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum, and yields a low-dimensional representation of speech.

Finding consensus among words: lattice-based word error minimization

A new algorithm for finding the hypothesis in a recognition lattice that is expected to minimize the word error rate (WER) is described, which overcomes the mismatch between the word-based performance metric and the standard MAP scoring paradigm that is sentence-based.

Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models

An important feature of the method is that arbitrary adaptation data can be used—no special enrolment sentences are needed and that as more data is used the adaptation performance improves.

Maximum likelihood discriminant feature spaces

A new approach to HDA is presented by defining an objective function which maximizes the class discrimination in the projected subspace while ignoring the rejected dimensions, and it is shown that, under diagonal covariance Gaussian modeling constraints, applying a diagonalizing linear transformation to the HDA space results in increased classification accuracy even though HDA alone actually degrades the recognition performance.

Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains

A framework for maximum a posteriori (MAP) estimation of hidden Markov models (HMM) is presented, and Bayesian learning is shown to serve as a unified approach for a wide range of speech recognition applications.