Modular Construction of Time-Delay Neural Networks for Speech Recognition

  title={Modular Construction of Time-Delay Neural Networks for Speech Recognition},
  author={Alexander H. Waibel},
  journal={Neural Computation},
  • A. Waibel
  • Published 1 March 1989
  • Computer Science
  • Neural Computation
Several strategies are described that overcome limitations of basic network models as steps towards the design of large connectionist speech recognition systems. [] Key Result Using these techniques, phoneme recognition networks of increasing complexity can be constructed that all achieve superior recognition performance.
A Survey of Temporal Techniques Applied Toward Neural Network Based Continuous Speech Recognition
Neural network architectures for the recognition of continuous speech are reviewed and Hierarchic structures that recognize events of increasing temporal scale seem to provide the most promising path toward effective recognition ofContinuous speech.
Modularity and Neural Integration in Large-Vocabulary Continuous Speech Recognition
This thesis presents a modular deep neural network for acoustic unit classification that can combine multiple well trained feature extraction networks into its topology and presents a word prediction deep network that functions at the lower subword level.
Combining Neural Networks and Hidden Markov Models for Speech Recognition
Combining ANN and HMM within a unifying framework is a suitable approach to ASR, which takes benefit from both techniques and overcomes the corresponding limitations.
Modular combination of deep neural networks for acoustic modeling
It is shown that bottleneck features improve the recognition performance of DBN/HMM hybrids, and that the modular combination enables the acoustic model to benefit from a larger temporal context.
Neural Networks and the Time-Sliced Paradigm for Speech Recognition
The time-slicing paradigm and the training of the recurrent neural network together with details about the training samples are described and the concept of natural connectionist glue and the recurrent Neural network's architecture used for this purpose is introduced.
Continuous Speech Recognition Using the Time-Sliced Paradigm
This paper presents the latest recognition results obtained with the Parallel-RCC and describes some of the attempts to further analyze the networks output with an error-recovery method to obtain the final result.
Low latency modeling of temporal contexts for speech recognition
A sub-sampled variant of these temporal convolution neural networks, termed time-delay neural networks (TDNNs), are proposed, which reduce the computation complexity by ∼ 5x, compared to TDNNs, during frame ii.
Parallel system design for time-delay neural networks
The authors develop a parallel structure for the time-delay neural network used in some speech recognition applications that shows a greatly reduced complexity while maintaining a high throughput rate.
The Time-Sliced Paradigm - A Connectionist Method for Continous Speech Recognition
Signal processing and training by a neural network for phoneme recognition
An integrated user interface guides a user unfamiliar with the details of speech recognition or neural networks to quickly develop and test a neural network for phoneme recognition.


Phoneme recognition using time-delay neural networks
The authors present a time-delay neural network (TDNN) approach to phoneme recognition which is characterized by two important properties: (1) using a three-layer arrangement of simple computing
An introduction to computing with neural nets
This paper provides an introduction to the field of artificial neural nets by reviewing six important neural net models that can be used for pattern classification and exploring how some existing classification and clustering algorithms can be performed using simple neuron-like components.
Modularity and scaling in large phonemic neural networks
The authors train several small time-delay neural networks aimed at all phonemic subcategories and report excellent fine phonemic discrimination performance for all cases and propose several technique that make it possible to grow larger nets in an incremental and modular fashion without loss in recognition performance and without the need for excessive training time or additional data.
Neural network models of sensory integration for improved vowel recognition
It is demonstrated that multiple sources of speech information can be integrated at a subsymbolic level to improve vowel recognition and compare favorably with human performance and with other pattern-matching and estimation techniques.
Nonlinear dynamics of feedback multilayer perceptrons.
  • Bauer, Geisel
  • Physics
    Physical review. A, Atomic, molecular, and optical physics
  • 1990
The nonlinear dynamics of multilayer perceptrons with feedback are studied and it is shown that their dynamics provides a built-in time-warping invariance, as required for presentation speed fluctuations in speech recognition.
Robust Classifiers without Robust Features
A two-stage, modular neural network classifier is developed and applied to an automatic target recognition problem, discussing the problem of robust classification in terms of a family of decision surfaces, the members of which are functions of a set of global variables.
Received 6 November
  • Received 6 November
  • 1988