Phoneme recognition using time-delay neural networks

  title={Phoneme recognition using time-delay neural networks},
  author={Alexander H. Waibel and Toshiyuki Hanazawa and Geoffrey E. Hinton and Kiyohiro Shikano and Kevin J. Lang},
  journal={IEEE Trans. Acoust. Speech Signal Process.},
The authors present a time-delay neural network (TDNN) approach to phoneme recognition which is characterized by two important properties: (1) using a three-layer arrangement of simple computing units, a hierarchy can be constructed that allows for the formation of arbitrary nonlinear decision surfaces, which the TDNN learns automatically using error backpropagation; and (2) the time-delay arrangement enables the network to discover acoustic-phonetic features and the temporal relationships between them independently of position in time and therefore not blurred by temporal shifts in the input. [] Key Method

Phoneme recognition: neural networks vs. hidden Markov models vs. hidden Markov models

A time-delay neural network for phoneme recognition that was able to invent without human interference meaningful linguistic abstractions in time and frequency such as formant tracking and segmentation and does not rely on precise alignment or segmentation of the input.

Hindi Phoneme Recognition Using Time Delay Neural Network

The authors have implemented TDNN (based on the Bochum system) for the recognition of unvoiced unaspirated and voiced un aspirated stop consonants of Hindi speech, and the results are presented in this paper.

A phoneme recognition system using modular construction of time-delay neural networks

Research on alternative approaches of representing phoneme data to be input into an artificial neural network and alterations that can be made in the network to reduce training time without

Phoneme Recognition: Neural Networks vs

It is shown that the TDNN "invented" well-known acoustic-phonetic features and the temporal relationships between them are indeendent of position in time and hence not blurred by temporal shifts in the input.

Incorporating acoustic-phonetic knowledge in hybrid TDNN/HMM frameworks

  • C. DugastL. Devillers
  • Computer Science
    [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing
  • 1992
A modular TDNN architecture on the basis of acoustic-phonetic knowledge, where each sub-network is trained on a different subset of phonemes is defined, which offers the possibility of proposing a framework to enlarge the number of outputs by defining context-dependent sub-networks.

A recurrent time-delay neural network for improved phoneme recognition

  • F. GrecoA. PaoloniG. Ravaioli
  • Computer Science
    [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing
  • 1991
The authors propose a modification to the structure of the time-delay neural network (TDNN), obtained through feedback at the first-hidden layer level, called RTDNN (recurrent TDNN), which consists of the classification of the unvoiced plosive phonemes.

Speaker-independent phoneme recognition on TIMIT database using integrated time-delay neural networks (TDNNs)

  • N. HataokaA. Waibel
  • Computer Science
    1990 IJCNN International Joint Conference on Neural Networks
  • 1990
A structure of neural networks based on the integration of time-delay neural networks which have several TDNNs separated according to the duration of phonemes is described for speaker-independent and context-independent phoneme recognition.

Continuous Speech Phoneme Recognition Using Dynamic Artificial Neural Networks

The main objective of this paper is the investigation of dynamic ANN's, namely the Time-Delay Neural Networks (TDNN) and Recurrent Neural networks (RNN) - that are the most suitable for recognition of time se- quences.

Tunable time delay neural networks for isolated word recognition

  • Duanpei WuJ. Gowdy
  • Computer Science
    Proceedings IEEE Southeastcon '95. Visualize the Future
  • 1995
The proposed system is a modification of the original time delay neural network structure of Waibel et al. (1989) and consists of a group of sub-nets, and each isolated word or phoneme to be recognized corresponds to one sub-net.



Context-dependent phonetic Markov models for large vocabulary speech recognition

  • A. Derouault
  • Computer Science
    ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing
  • 1987
This paper shows that both the analysis of the errors made by the recognizer, and linguistic facts about phonetic context influence, suggest a method for choosing context dependent models, which allows to limit the growing of the number of phonems, and still account for the most important coarticulation effects.

Speaker‐independent phoneme recognition using hidden Markov models

The currently popular hidden Markov modeling to speaker‐independent phoneme recognition is extended using multiple code books of various LPC‐derived parameters and discrete HMMs and the co‐occurrence smoothing algorithm that enables accurate recognition with only a few training examples of each phone is introduced.

BYBLOS: The BBN continuous speech recognition system

  • Y. ChowM. O. Dunham R. Schwartz
  • Computer Science
    ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing
  • 1987
In this paper, we describe BYBLOS, the BBN continuous speech recognition system. The system, designed for large vocabulary applications, integrates acoustic, phonetic, lexical, and linguistic

Learning Phonetic Features Using Connectionist Networks

A method for learning phonetic features from speech data using connectionist networks is described and a supervised learning algorithm is presented which performs gradient descent in weight space using a coarse approximation of the desired output as an evaluation function.

Context-dependent modeling for acoustic-phonetic recognition of continuous speech

The combination of general spectral information and specific acoustic-phonetic features is shown to result in more accurate phonetic recognition than either representation by itself.

Learning spectral-temporal dependencies using connectionist networks

  • D. Lubensky
  • Computer Science
    ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing
  • 1988
Describes the application of a layered connectionist network for continuous digit recognition using syllable based segmentation and compares the performance of the network to that of a nearest neighbor classifier trained and tested on the same database.

Comparative study of nonlinear time warping techniques in isolated word speech recognition systems

In this paper, the effects of two major design choices on the performance of an isolated word speech recognition system are examined in detail. They are: 1) the choice of a warping algorithm among